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When students enter an examination room, they do not leave their 
personalities at the door. This fact is brought out by a recent experi- 
ence of the writer. In trying to assist a student who was distressed 
by an unduly low grade on a true-false test, he casually examined the 
errors made by the student. Of forty-four errors, thirty-five were on 
false items incorrectly marked ‘‘true,”’ while only nine times in the 
test had the student marked true statements ‘‘false.”” Such striking 
instances may be rare, but they indicate unmistakably that testers 
must study the reactions of the student as a whole. In this case, 
merely studying the response-pattern with the student made his 
difficulty clear, and perhaps will motivate him to be less gullible in 
responding to plausible statements in the future. For the person 
constructing tests, a more fundamental question arises: Did the test 
fairly measure the student’s knowledge? Obviously, if he had been 
marked only on his responses to true statements, he would have had a 
high grade, whereas his responses on only the false items would have 
placed him even lower than did the total score. It is easy to moralize 
about the desirability of penalizing uncritical thinking, but a score 
which conceals personality tendencies within an alleged measure of 
knowledge is apt to mislead both teacher and student. When a 
student is faced with a difficult decision on a test item, he will often 
guess, marking the alternative which seems best. But where one 
student characteristically is ‘“‘impartial,’’ guessing ‘‘true” as often 
as “false”? another, like the student above, may habitually mark 
“true” on most of his guesses. From the viewpoint of merit, there 
is little choice between these students, but under some circumstances 
the second student will be severely penalized, not for greater ignorance, 


but because of his acquiescent habits. 
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Evidence from previous investigations indicates that some factor 
other than knowledge alone affects the student’s response to a true- 
false statement. In a recent study employing a true-false test, the 
writer was led to divide the test (during scoring) into two parts, one 
composed of false items; the other, true items. The reliability and 
validity of the score based on the ‘‘false-item”’ test were far greater 
than those based on the “‘true-item”’ test, and greater than those of 
the test taken as a whole.* This finding is based on only one test, 
but suggests that some psychological difference between responses 
of ‘‘true” and ‘‘false’”’ reduces the validity of the true items, and, 
therefore, of the test itself. The true-false technique is in many ways 
convenient to the tester, but evidence in general shows it inferior 
to other objective techniques. If the finding cited is substantiated, 
and if the factor reponsible can be identified and controlled, it may be 
possible to make the true-false test a far more useful tool. 

Support for the viewpoint that responding “true”’ and ‘‘false’”’ may 
be psychologically different acts is found in studies by Fritz and others, 
who have shown repeatedly that when students guess they respond 
“true”? more often, on the average, than ‘‘false.’’!:57.5.10, >» 365-366 
This would cause the guessing student, even if totally uninformed, 
to receive a positive score on items keyed true, but would cause him 
to guess wrong on more than half of the items keyed false. Rundquist’s 
investigations in the area of personality encountered a similar situa- 
tion.11 He considered separately the testee’s responses (Agree- 
Uncertain-Disagree) to “‘acceptable statements” suggesting favorable 
aspects of a problem and to “unacceptable statements” dealing with 
the same idea, but presenting it in a pessimistic form. Tests of such 
areas as inferiority, education, and morale were used to identify persons 
of low self-confidence. Responses to unacceptable items correlated 
higher with responses to unacceptable items than with acceptable 
items, suggesting a psychological difference, and the unacceptable 
items possessed greater validity than acceptable items as judged by 
discrimination between criterion groups. Lentz, also working with 
personality tests, has found individual differences in the tendency 
to respond “‘true”’ rather than ‘“‘false.”® He has labelled this trait 
“acquiescence.” The highly acquiescent student may be contrasted 
with the student whose responses are unbiassed, and who guesses 48 
often ‘“‘true” as “false”; even further along the scale is the 
overcritical student, who looks for minor exceptions and tends to 
respond ‘‘false” rather than “‘true.”’ If such a trait as acquiescence 
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is operating in true-false achievement tests, it might be expected to 
reduce the validity of testing. 

This article reports investigations into the implications of these 
findings from several directions, seeking to answer at least tentatively 
these questions: 

(1) Are true items in general less valid and reliable than false items? 

(2) Is acquiescence in true-false tests a consistent individual per- 
sonality difference, or is it merely a tendency, present in similar degree 
in all students, to accept statements when in doubt? 

(3) Can the true-false test be improved by directions which are 
intended to eliminate acquiescent behavior? 

The first question calls for verifying the previous comparison of 
“true tests”’ and “‘false tests’”’ by study of further examinations. The 
second question calls for examining reponses of students in different 
tests, to determine the consistency of differences. An experimental 
study has been made in an attempt to answer the third question. 
In addition to these lines of evidence, some comment has been made 
on the psychological and statistical implications of these hypotheses, 
looking toward changes in true-false test procedure. 


I. A COMPARISON OF TRUE AND FALSE TESTS 


Procedure.—Any true-false test may be considered as composed of a 
“true test,’’ containing the items keyed true, and a ‘“‘false test,’ 
composed of the items keyed false. In any test, two scores may be 
obtained representing the student’s performance on these two parts, 
which may be called the ‘‘Trues” score and ‘‘Falses’’score. The 
reliability and other statistical information about these scores may be 
obtained separately, following usual procedures. 

In order to establish whether the difference between the ‘‘Trues”’ 
score and ‘‘Falses’’ score previously found is peculiar to the test there 
studied, or is peculiar to tests prepared by one instructor, or is generally 
true, a large number of true-false tests actually used in college class- 
rooms were obtained for study. As shown in Table I, a variety of 
subjects is represented. While these tests are not necessarily repre- 
sentative of all true-false tests, some generalization is possible. It 
should be noted that these are not all good tests; some were prepared 
casually, as many classroom tests are prepared, and some are parts 
of larger examinations. Except for the sociology examinations, and 
two of those in economics, each test was used with different 
students. 
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Results —The data given in Table I lead to these conclusions: 

(1) Each test contains more false items than true items, except 
Tests 1 and 2, where the numbers were deliberately equated, since 
the tests were to be used in experiments. 

(2) When correction is made for the number of items entering the 
Trues score and Falses score, the mean score on the true items (Col. 1) 
is greater than the mean score on the false items (Col. 2) in Tests 
1, 2, 4, 6, 7, 8, and 10. 

(3) When a similar correction for length is applied tothe standard 
deviations of the True and False scores (Col. 3 and 4), the standard 
deviation of the Falses score is greater for eight of the ten tests (all 
except Tests 1 and 5). 

(4) The split-half reliability of the Falses score is often greater 
than that of the total test, and is, except for Tests 5 and 8, greater 
than that of the Trues score. The critical ratio of the difference in 
reliability was tested by applying Fisher’s z-transformation to the 
half-scores, since the Spearman-Brown formula tends to introduce 
an element of error. This procedure for testing significance is con- 
servative. The critical ratio so obtained exceeds 2.0 only for Tests 1, 
2,and6. The consistency of the differences from test to test, even when 
allowance is made for the slightly smaller lengths of the True tests, 
suggests that significant differences would be found more frequently 
if larger groups were used. The “social significance’’ of the differ- 
ences is unquestionable, when passing from true to false items 
changes reliability coefficients from 0.1 to 0.7, or 0.2 to 0.7, in some 
cases. | 

(5) Data from Test 5 and Test 8 do not support the hypothesis 
that the Falses score is superior in reliability to the Trues score. This 
may be evidence against the hypothesis. It may be due to the nature 
of the subject-matter tested, which is highly specific; it is noticeable 
that Test 5 is more reliable than other, longer tests. The finding 
may be the result of the high proportion of false items used in these 
two tests (over sixty per cent). Finally, the discrepancy may be 
considered as a chance deviation, perhaps due to the particular split 
used. In this connection it is significant that the correlation of Trues 
with Falses for Tests 5 and 8 is greater than the “‘self-correlation”’ of 
either score. 

(6) The correlation of the Trues score with the Falses score is low 
in every case, and is usually low when corrected for attenuation. In 
the case of Test 4, this correlation is negative. 
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In a previous paper, an analysis of the effects of acquiescence 
was made on a basis of probability. This mathematical analysis 
indicated that this factor, if present, would cause the mean score on 
the true items to increase, and that of the false items to decrease. 
Further, one would expect the standard deviation of the false items 
to be greater than that of the true items. The student who lacks 
knowledge will, on the whole, guess on more items than the informed 
student. (When guessing, this poor student will guess right more often 
than wrong on true items, but will lower his score on false items. Asa 
result, acquiescence reduces the range between good and poor students 
on true items, but increases the spread on false items. This tendency 
would be expected to decrease the reliability and validity of the Trues 
score, increasing that of the Falses. If the number of true and false 
items are equal, acquiescence would not affect the mean total score, 
but would tend to decrease the range of scores. All the expectations 
resulting from the theoretical analysis are supported by the findings 
above. Means, standard deviations, and reliabilities are such as 
would result from the influence of acquiescence. 

Validity was studied for three tests where a criterion was available. 
For Test 1, the sum of the student’s T-scores on the other psychology 
examinations he took during the semester was used as a criterion of 
validity. None of the other tests was true-false, but both essay and 
objective forms were included. For thirty-two cases, the validity of 
the Trues score was 0.222; that of the Falses score, 0.666; and that of 
the total test score, 0.670. The critical ratio of the difference between 
Trues and Falses is 2.24 by the z-test. For Test 2, the validities of the 
three scores, using a similar criterion, were 0.319, 0.700, and 0.598, 
respectively (fifty-seven cases) ; the difference between Trues and Falses 
has a critical ratio of 2.74. For Test 4, the criterion used was obtained 
by summing letter grades from other tests. By a rank-difference 
method, the validity correlations for twenty-one cases were: Trues, 
0.002; Falses, 0.362; Total score, 0.303. The critical ratio of the first 
difference is 1.22. While the criteria used here are not the most 
desirable, the way in which these data support the remaining findings 
is strong evidence for the hypothesis. 

Conclusions.—Although Tests 5 and 8 contradict the data from 
other tests, it appears safe to conclude that( a score based solely on 
false items is more reliable and valid than a score based solely on true 
items, and thatthe Falses score is as reliable and valid as the total 
score On a true-false test, although the latter is twice as long.) These 
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results are consistent with the theory that acquiescence—the tendency 
to answer “‘true”’ rather than “false’’ when in doubt—operates in 
determining the student’s score. | 

A counter hypothesis, not here disproved, is that it is easier to- 
compose valid false items than valid true items. \, This implies that the 
differences found above relate to the failure of the test-maker to 
develop sufficiently implausible true items. While this theory 
explains the data about as well as the acquiescence hypothesis, it seems 
unlikely that the same weakness would be found in items from so 
many instructors. Nevertheless, greater attention to drafting true 
statements which are not obviously true might improve testing. 


Il. THE CONSISTENCY OF ACQUIESCENT BEHAVIOR 


In thinking about the tendency to accept rather than reject state- 
ments, two points of view may be distinguished. Hypothesis A may 
be stated as follows: 


Whenever students make a pure guess, there is probability k that any 
response will be true rather than false, or that & per cent of their combined 
responses will be true. The value of & is substantially the same for all stu- 
dents. This probability is less certain where the student has partial knowl- 
edge of an item, but, in general, one student will be about as acquiescent as 
another, provided that they guess an equal amount. ’ 


Hypothesis B emphasizes the réle of individual differences, and 
suggests that, due to past experience with true-false tests or to general 
gullibility or criticalness, students possess acquiescent behavior to 
varying degrees. It may be formulated: 


When any student makes a pure guess, the probability that he will respond 
“true” rather than ‘“‘false” is k. The value of k is different for each student, 
although for most students it is greater than one-half. 


These theories, stated in terms of pure guesses, seem trivial, since 
responses on true-false tests, even when uncertain, are determined 
at least in part by knowledge and by the tone of the item. Never- 
theless, any uncertain response is in part a guess, and to that degree 
may be influenced by the tendency to acquiescence. 

One approach to determine whether A or B is the more valid 
hypothesis is to determine whether students who respond true more 
often than false do so on test after test. Three tests were available 
which had been given, on separate occasions, as regular examinations 
in a sociology course. Two other tests had been given to another 
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group. The excess number of a student’s answers of ‘true’ over 
his answers of “‘false’”’ may be taken as a measure of his acquiescence. 
The very acquiescent student (assuming Hypothesis B) will have a 
large score, while the very critical student will have a negative score. 

Results—The ‘acquiescence score’’—excess of “trues” over 
“‘falses’’—was obtained for every student taking two or more of the 
tests. The correlation of acquiescence scores from test to test were 
as follows: Test 1* with Test 2, (one hundred nine cases), 0.416; 
Test 1 with Test 3, (eighty-six cases), 0.576; Test 2 with Test 3, (ninety 
cases), 0.364; Test 4 with Test 5 (eighteen cases), 0.61. All correla- 
tions are significant at the one-per-cent level. 

Conclusions.—These values demonstrate clearly that there is some 
factor underlying the student’s behavior in each true-false test which 
causes his answers to be consistently more or less acquiescent than 
those of his neighbors. If these values had been very small, Hypothe- 
sis A would have been supported; being positive, these results support, 
but do not prove, Hypothesis B. Even if acquiescence were a universal 
characteristic, present in everyone to the same degree, it could not 
affect the test score of the student who has complete knowledge of 
every item. (Only when a response is in part a guess does this tendency 
affect responses) Under Hypothesis A, then, the student who must 
guess most often will show the greatest excess of ‘‘trues’’ over “‘falses.” 
The tendency to be uncertain and, therefore, to guess will be present 
in all tests, and may explain the correlations found. 

The very fact that individual differences in acquiescence score 
are found is support for Hypothesis B. Under Hypothesis A, it is 
difficult to account for the fact that almost always a minority of 
students is found who respond false more often than true. In Test 
6—Table I—for example, the perfectly informed student would mark 
forty-three items ‘‘true,” and fifty-seven items ‘‘false,’’ receiving 
an acquiescence score of —14. Scores actually found among one 
hundred thirteen cases range from —34 to +34; twenty-six students 
had scores lower than — 14. 

It may be concluded, from our general knowledge of behavior as 
much as from these data, that it is reasonable to look upon acquies- 
cence as a continuous trait ranging from great acquiescence to great 
criticalness. The crucial experiment to test the two hypotheses must 
employ some such technique as that of Fritz, who studied responses 





* These numbers do not correspond to the test numbers in Table I. 
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to artificial items where every student lacked knowledge, so that 
guessing was always required. 


Ill. AN EXPERIMENTAL TRAIL OF NEW DIRECTIONS 


Procedure.—Directions designed to eliminate acquiscence were 
tested experimentally. The study followed in many respects one by 
Dunlap, De Mello, and Cureton.*’ They prepared a test which 
students took three times, under each of three forms of directions, 
one of which was “ Do not guess, but after you have marked all the 
items you are certain of, count your ‘true’ and ‘false’ responses and 
mark all remaining items with the least-used symbol.’”’ Their data 
showed these directions superior to instructions to guess, but slightly 
inferior to do-not-guess instructions. 

For the present study, a regular examination in general psychology, 
covering three chapters, was used. Items were prepared by two 
instructors and combined into a total test without preliminary trial. 
Sixty-eight items were prepared, divided equally between true and 
false, but in duplicating the test one item was repeated unintentionally, 
and another omitted. The second appearance of the item was dis- 
regarded in scoring the test. Students in two classes were tested with 
this instrument as a regularly announced examination. Students in 
alternate rows were given tests with these directions (Form A): 


This is a true-false test. Circle the T before a statement if it is always 
true. Circle the F if the statement is false in any way. If you are at all 
uncertain whether an item is true or false, LEAVE IT BLANK. DO NOT 
GUESS. Your final score on the test will be the number answered right minus 
the number answered wrong. 


The remaining students were given the experimental directions 
(Form B): 


This is a true-false test. Circle the T before a statement if it is always 
true. Circle the F if the statement is false in any way. Mark ONLY items 
you are CERTAIN of the first time through the test. THEN 1. Count the 
number of items you have marked true and the number marked false. 

2. GUESS at all remaining items, indicating your answer by underlining 
the T or F. Balance your guesses, so that when you have finished, you will 
have marked exactly thirty-four items true and thirty-four false. The state- 
ments in the test are exactly divided, half of them being true. 


This pattern of response was intended to force the highly acquiescent 
and highly critical student to discard that behavior, so that the per- 
sonality factor would be held constant. 
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Results. (a) Compliance of Students.—The responses of the stu- 
dents suggest that the directions were followed closely. An inad- 
vertent failure to capitalize the instructions in Form B to underline 
guesses caused this to be frequently overlooked; consequently, no 
analysis was made of such indications. Of the sixty-eight students 
directed to not guess (Form A), fifty omitted one or more items. In 
the B group, under do-guess instructions, only eight of the seventy-two 
omitted any item, and only two omitted as many as three. The 
instructions to respond “‘true”’ as often as “‘false’’ (Form B) were 
followed less precisely; thirteen of the group used more “trues” 
than “‘falses.”’ 

(b) Comparison of Group Means and Sigmas.—The mean score 
(right-minus-wrong) for Group A was 37.3; standard deviation, 8.30. 
The mean score (right-minus-wrong) for Group B was 36.6; standard 
deviation, 10.50. The difference in means has a critical ratio of only 
0.44. The difference in standard deviations is 1.95 times its standard 
error; while not statistically significant, thisevidence supports the theo- 
retical analysis indicating that acquiescence reduces the total range 
of scores.* 

(c) Comparison of Reliabilities —The odd-even reliability of Form 
A, after the Spearman-Brown formula was applied, was 0.616. The 
reliability of Form B was 0.553. The difference was tested for sig- 
nificance by application of a z-transformation to the half-scores; the 
critical ratio was only 0.45. 

(d) Comparison of Validities—As a crude criterion of validity, the 
sum of the T-scores of students on the remaining examinations of the 
semester was used. These data were available for only one class, 
which included thirty-two students in Group A and thirty-one stu- 
dents in Group B. For these cases the correlation with the criterion 
for Form A was 0.670; for Form B, 0.620. By the z-test, this differ- 
ence has a critical ratio of only 0.33. 

Conclusions.—This evidence appears at first glance to overthrow 
the acquiescence hypothesis, since holding that factor constant 
decreased reliability and validity. A more careful analysis shows 
that this result could have been predicted, since the experimental 
directions were poorly conceived. -When the acquiescent student 
responds to items, it is most unlikely that he will mark only those of 
which he is absolutely certain. Rather, he may be expected to mark 
many items that he thinks he knows, when he is actually misled by 
his gullibility to overlook exceptions which other students might 
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recognize. After all students have marked those items on which they 
think they are correct, (one would expect to find the acquiescent 
student to have marked more false items “true” than vice versa, 
Then, when forced to balance his responses, on the items he does not 
know, he will be forced to make further errors, marking true state- 
ments “false’”’ to compensate. In effect, these guessed responses 
are determined for him by the responses he considered certain, so 
that for the acquiescent student the number of degrees of freedom 
is reduced. For the non-acquiescent student, who responds ‘‘true”’ 
and ‘‘false’’ equally often, the number of degrees of freedom equals 
the total number of items in the test, as it does under normal directions. 
The result of this factor in the experimental directions is to shorten 
the test for the overacquiescent or overcritical student, thus reducing 
reliability and validity. A similar factor may explain the failure of 
the novel directions proposed by Dunlap, De Mello, and Cureton 
to improve the accuracy of testing. 


DISCUSSION OF IMPLICATIONS 


Findings.—The data gathered in these studies, taken as a whole, 
reinforce the belief that the tendency to respond ‘‘true”’ more often 
than ‘‘false”’ affects scores of students on true-false tests. ‘Evidence 
supports, but does not prove, the viewpoint that this tendency is a 
trait present in different amounts in different students. False items 
are on the whole superior to true items, as the reliability and validity 
of scores based on false items are usually greater than those based on 
true items. Discovery that, within a test of sixty-seven items having 
a reliability of 0.616, the thirty-four false items scored alone would 
give a reliability of 0.723, while the thirty-three true items give only a 
reliability of 0.111, is certainly worth the attention of those using 
true-false tests. True items and false items do not measure the same 
behaviors, which means that one or both is in part invalid. Further 
research will be needed to clarify many questions raised by these 
explorations. At present, it is significant to translate these findings 
into suggestions for improving the true-false test. 

Suggestions for Improving the Validity of Testing.—Directions 
intended to eliminate the effect of acquiescence proved useless, raising 
the question whether any other approach can control this factor. It 
is doubtful whether control through other novel directions will solve 
the problem, but some other more or less useful procedures suggest 
themselves. ‘First, one may use more false items and fewer true items 
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in the test, in order to obtain maximum validity. This assumes that 
the tendency to acquiescence will remain the same even if more than 
half the items are keyed false. While the student may tend normally 
to mark sixty per cent of the items true, one cannot assume that this 
tendency will continue to operate if the student, on the basis of his 
knowledge, has already marked many items false. Certainly, if the 
test were composed of only false statements, the student would tend 
to become suspicious. Nevertheless, if a set of false items has as 
great validity as a mixed set of double the length, some increase in 
the percentage of false items may be warranted. In particular, when 
framing difficult items that can be cast as easily in a negative form as 
positive it may be well to use the latter; thus “Jackson succeeded 
Van Buren in the presidency” is more likely to be missed by the 
person without knowledge than ‘“‘Van Buren succeeded Jackson, etc.” 
If students are given a chance to study the key, they may eventually 
become test-wise to the extent that they mark the majority of items 
false, but the instructor should be able to control this factor. 

Second, the correction-for-guessing formula may be revised to allow 
for the relations discovered above. The usual R — W formula 
assumes that the student is as likely to guess correctly as incorrectly 
on any item. Actually, the average student is more likely to guess 
right on a true item, and more likely to guess wrong on a false item. 
If one can assume that Fritz’s figure is typical, sixty-two per cent of 
guesses will be “‘trues.’”” When guessing on items keyed true, the typi- 
cal student will be correct sixty-two per cent of the time, wrong thirty- 
eight per cent. For true items, the correct formula to counteract 
guessing is X, = R, — ®24,W;. For false items, the similar formula 
is X, = R; — 38%oW;. One may apply these formulas to each 
student by counting separately the number of true items marked 
‘false’? (W,), and the number of false items marked “true” (W,). 
The combined formula is 


Total score = rights minus ®24,W, minus 3869 Wy. (1) 


If it is more convenient to count both types of errors together, 
simplified approximation may be used. One may assume that each 
student’s guesses of “‘true’’ are divided between the true and false 
items in the same proportion as these appear in the key, and that his 
guesses of “‘false’’ are similarly divided. If the test contains a true 


items and b false items, 
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38 a 
at W,= 62 b W; 
in 
ly and formula (1) becomes 
is 
is _— _ (62a + 38b) 
Total score = rights (38a + 62b) wrongs. (2) 
d 
s This revised formula reduces to the familiar R — W when the number 
n of true and false items are equal, but gives slightly different results 
n when a greater number of either true or false items is used. In six 
“ of the ten tests studied in Part I of this report, the proportion of 


d false items was substantially greater than fifty, which implies that the 
usual correction formula introduces some error. Only rarely is this 
” error sufficiently great to be of importance. The R — W formula 
overpenalizes the acquiescent student when there are more false than 


¥ true items, and underpenalizes him when there are more true items. 
The highly critical student, who tends to mark false when guessing, 
= is aided by the usual formula on most of these tests, but is penalized 
a when the test contains more true items than false. 
y The proposal to increase the proportion of false items, and the 
- proposal to revise the correction formula, give an advantage to the 
. highly critical student and increase the penalty on the unusually 
if acquiescent student. This is theoretically undesirable, just as is the 
:. present practice which makes no allowance for acquiescence; it is 
1. justifiable from the empirical point of view that this correction is, on 
t the average, more valid than the  — W formula since the majority — 
. of students are moderately acquiescent. A more defensible approach, 
h from the point of view of logic, would be to determine each student’s 
d acquiescence and modify the correction formula for each individual. 
) This could be done by imbedding in the test several ‘‘dummy’”’ items | 
which could only be answered by a guess, thus determining whether | 
the pupil’s guesses are predominantly “true” or “‘false.’”’ Such a 
) procedure is impractical, except in some research problems, and as a 
result the revised correction formula proposed above seems the most 
3 reasonable compromise. ‘The suggested value of sixty-two per cent, 
h based on Fritz’s work, should be verified by further investigation. 
e Third, one may improve the true-false test by training the student. 
. If the student can be shown that his acquiescence or criticalness is a 
e factor reducing his chance of obtaining a maximum score, he may be 


able to control that tendency in future tests. What is required is a 
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revision of the student’s mind-set, rather than a mere revision of 
directions such as was attempted above. Probably high-school 
teachers, and teachers of how-to-study courses, could assist their pupils 
by calling attention to the influence of this tendency. 

The reader may be inclined to favor a fourth possibility—rigid 
regulations against guessing. Many studies have shown empirical 
merit in ‘‘do not guess”’ instructions,” but the justice of such directions 
has been thrown into serious question by such discussions as those of 
Swineford,?:!* and Gilmour and Gray.® It seems clear that under 
“‘do not guess” directions, many students continue to guess, which 
means that the factor of acquiescence has not been controlled, while 
the tendency to gamble or “‘submissiveness to directions”’ has been 
introduced as an added factor reducing validity. 

Arnold has proposed making false items less obviously false, to 
reduce the preponderance of false-items-marked-true errors.' This 
suggestion seems unlikely to solve the most serious difficulty—the 
unreliability of the trues score. 


SUMMARY 


The true-false test has been studied to determine whether the 
hypothecated trait of acquiescence—the tendency to mark items 
“‘true”’ rather than “‘false,’’? when guessing—influences scores. Exper- 
imental data and theoretical considerations show that this tendency 
makes false items more valid and reliable than true items, reduces 
the range of test scores when the number of true and false items are 
equal, reduces the mean score when a majority of items are true, or 
lowers it when the majority are false, and causes the R — W correction 
formula to be inappropriate in many cases. Practical suggestions 
resulting from the study are: (1) Use of more false items than true 
items to increase reliability and validity, (2) adoption of a revised 
correction formula to neutralize more fairly the effects of guessing, 
especially where precise results are desired, and (3) training of the 
student to be alert to his own characteristic of acquiescence or non- 
acquiescence so that he may control its effect. Further research is 
needed to study the validity of these proposals, to determine whether 
acquiescence as found in these studies is the same trait as that identified 
by Lentz in personality and attitude tests, to determine whether 
acquiescence of this type is related to responses to propaganda and 
other life situations, and to study the development of acquiescent 
behavior in individuals. 
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Perhaps the most generally significant aspect of this study is the 
clear-cut evidence that careful analysis is necessary to determine the 
behaviors underlying an individual’s test score. Teachers have 
commonly been confident that a test is valid if its items represent 
the material being taught. Evidence in this study shows that true 
items and false items measure different things, so that a student 
making a high score on apparently valid true items might do badly 
on equally valid-appearing false items. A careful search for under- 
lying traits in other tests may show that many tests which appear 
valid to logic are not so trustworthy as measures of the nature of 
human minds. 
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DIFFICULTIES OF COMMUNICATION BETWEEN 
EDUCATORS AND PSYCHOLOGISTS: 
SOME SPECULATIONS 
ROGER G. BARKER 


Stanford University 


A practical social issue of immediate personal and professional 
importance to many psychologists is the relationship which exists 
between them and the personnel of related disciplines. As an instance 
we may consider some social interrelations between psychologists 
and educators. This will serve, also, as an example of the more 
general problem of difficulty of communication within social groups. 
That this problem is of great theoretical and practical signifi- 
cance, there can be no doubt. Effective intra-group communication 
is essential to group action, and the break-down of communications 
is a frequent source of failure of group undertakings. Tragedies 
such as that which occurred at Pearl Harbor obviously are due 
in part to faulty communication within government organizations. 
Less dramatic failures are continually occurring. In the instance 
to be considered, difficulties of communication between psychologists 
and educators handicap both, for each depends in an important degree 
upon the other: psychologists for jobs and socially significant prob- 
lems; educators for a scientific basis for school practices. 


THE CULTURAL BACKGROUND OF TEACHERS 


Difficulties in communication occur when the psychologist and 
potential educator first meet in the college classroom. Anyone who 
has taught child psychology to teachers and student-teachers has 
inevitably noted a resistance to certain findings and viewpoints in 
psychology and a difficulty on the part of some students in compre- 
hending without distortions certain common ideas about the behavior 
of children. This is not surprising, and it is certainly not peculiar 
to teachers. We know a great deal about selective awareness, 
rationalization, compulsive behavior, repression, and other phenomena 
of motivation wherein it is impossible to comprehend some ideas, 
and possible to comprehend but not act upon others. However, little 
has been done in the way of specifying the conditions existing in the 
teaching situation which give rise to this behavior. Recent social- 


psychological research suggests where we should look. 
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The meaning of an observation depends upon its context or frame 
of reference, and an important scurce of this context is the culture of 
the observer. All students bring their private cultures to class with 
them; they interpret the facts of child behavior, for example, in 
accordance with their own beliefs, and with those of their families, 
their churches, their school boards, aud their associates. Although 
teachers do not differ from other students here, there is evidence that 
teachers tend to be selected from a common cultural background, and 
hence tend to interpret the findings of child psychology with a common 
bias. Years ago Coffman? showed that teachers as a group come 
from the lower middle classes of the social structure. This is in line 
with the recent findings of Warner and his collaborators®® that 
schools are an important channel of social mobility in our culture, 
and that teachers are one of the mobile groups in the population. 
Here is a profession where, without the usual requirements of money 
and family, it is possible to attain admission to higher social strata. 
The profession of education attracts large numbers of persons who 
value social achievement, and it in turn fosters these values. Besides 
providing immediate social advantages to persons of the lower middle 
classes, teaching is a well recognized route to higher professional 
levels. Great numbers of the population can say, “I taught for a 
while,”’ and many teachers frankly consider the profession a respect- 
able temporary occupation until something better turns up. 

For these reasons, in the culture of many teachers the ideology 
of individual progress, attainment, betterment, and self-improvement, 
all defined in terms of social status, loom very large. They bring to 
the study of child psychology a design for living in which the test of 
value is social achievement. Under these circumstances important 
facts of child psychology are bound to be resisted. For example, the 
facts of unconscious motivation and the inheritance of intellectual 
abilities run counter to fundamental assumptions in the culture of 
great numbers of teachers. Such facts must, therefore, be denied, 
reinterpreted, or misunderstood if the pattern of teachers’ lives is not 
to be seriously disturbed. Even the finding that such activities as 
play, art, literature and science, can give emotional satisfactions and 
serve as substitutes for thwarted needs has a very special significance 
in @ culture where adjustment, satisfaction, and happiness do not 
exist apart from social attainment. Insofar as the “escape” and 
“releasing” effects of these activities contribute to the effectiveness 
of social achievement, they are considered desirable by these people, 
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just as adequate sleep and recreation are considered desirable. But 
insofar as such activities contribute to the satisfaction of the individual 
in his present social position, and remove from him the anxiety that 
leads to social striving, they are considered undesirable. As a sub- 
stitute for social mobility such activities are to be deprecated. Many 
teachers are seriously concerned about the great amount of satisfaction 
and lack of anxiety they find in their students. A recurring motif 
in the papers written by many teachers is the expression of a concern 
for the failure of students to take advantage of their opportunities 
to better themselves. The anxieties that many teachers feel most 
keenly come from their failure to arouse in their students the social 
ambitions of which they consider the students worthy. With such a 
system of values, it is obviously impossible for teachers to accept 
and act upon the suggestions of mental hygienists who hold other 
cultural values, and to adopt procedures that are likely to increase the 
sort of behavior the teachers strongly deplore. 

This is but a single example. There are many other facts about 
the private worlds of teachers into which they must with difficulty 
fit the findings of child psychology. For example, the social groups 
from which teachers largely come tend to have strong religious con- 
victions and standards of morality in terms of which behavior is cate- 
gorized as right or wrong. One wonders what the many teachers 
who come from such groups and for whom such beliefs are very central 
make of such observations as the ego-centricity, the amorality, and 
sexuality of children. There is, in addition, the whole field of personal 
motivation, the highly individual needs that bring teachers to work 
with children: Substitute parenthood, easy domination, security, 
service, etc., all of which give the facts of child psychology very 
particular, personal meanings in order that teachers may use children 
for their own purposes. 

If these assumptions are true, it is not difficult to understand why 
the education of teachers should be less satisfactory than that of 
physicians and engineers. Physicians and engineers are probably 
selected from social strata where there is less conflict between accepted 
mores and the facts with which they must deal. In addition, engineers 
and physicians have a much longer and more rigorous period of 
initial training that provides to a considerable extent for an appro- 
priate background of viewpoints and attitudes. 

Here is one of the primary facts that those who would improve 
teachers’ knowledge of children must face. It is related to the general 
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problem of acculturation, particularly to the question of the fate of 
cultural fragments introduced into new cultures. It is for students 
of culture change to say what, if anything, can be done about such a 
state of affairs, if it is desirable that something be done. Two ele- 
mentary facts may be mentioned, however. First, the very short 
contact most teachers have with the “foreign” ideas of child psy- 
chology means that the influence of these ideas must be slight. Until 
teacher training in psychology is much more thorough, this state of 
affairs will certainly continue. Secondly, the motivation of most 
teachers, at least during the period of their training, is such as to 
insure &@ minimum of learning. To all but a few the required psy- 
chology courses are a necessary nuisance to be endured with a minimum 
of involvement in order that most attention may be devoted to chosen 
academic fields. 

An additional aspect of the situation that makes the problem 
even more difficult lies in the fact that many of the aspects of the 
behavior of teachers with which psychologists are concerned derive 
from very central needs of the person, whatever the cultural back- 
ground may be. Parent-child behavior is so deeply embedded in 
the personality structure the culture enforces from the first day of 
life that rational argument is relatively ineffective in modifying many 
aspects of it. This means that the behavior of adults toward children 
is so firmly built into the personality that it is in some respects com- 
pulsive. Furthermore, much of this behavior is closely related to 
central evaluative convictions of religion, morality and socially 
acceptable conduct. In our culture few facts of engineering or 
agriculture, for example, come into conflict with fundamental per- 
sonality mechanisms, or accepted religious and social practices. 
Doubtless there is more conflict and resistance in the case of medical 
knowledge. But one can hardly think of a fact of child behavior, a 
technique of discipline, or a goal of child education about which most 
persons do not have strong, emotionally toned convictions. It 
appears, therefore, that many other applied scientists: Engineers, 
physicians, agriculturalists, etc. are better able emotionally to face 
the facts with which they must deal than are teachers. 


FORCES IN THE TEACHING SITUATION 


The teacher of teachers finds that viewpoints and procedures that 
are fully understood, mastered, and accepted outside the teaching 
situation are not infrequently ignored in schoolroom practice. This 
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suggests that, in addition to factors in the general culture which 
prevent teachers from accepting some facts of child psychology, there 
are conditions in the school situation itself which prevent them from 
acting upon the knowledge they have intellectually accepted. Refer- 
ence will not be made here to possible administrative restraints upon 
particular practices, but to the structure and dynamics of the general 
teaching situation. This is a complicated matter, of course, but there 
is one aspect of the situation of teachers that accounts for much. 
Teachers appear to be placed in conflicting, overlapping, social situa- 
tions to a greater extent than are most professional people. Teachers 
must be highly sensitive to the changing demands of many relatively 
independent groups: Their classes, their collegues, their administrators, 
their communities. Because of their exposed and dependent position, 
the behavior of teachers is very sensitive to these simultaneously 
acting, but independent and often conflicting influences. Consider 
some concrete determinants of a teacher’s behavior in the classroom. 
First of all, there is the classroom situation: The attitude of the pupils, 
the requirements of the lesson, and the teacher’s intentions and ideals 
with respect to it. At the same time the teacher’s behavior is to some 
extent determined by the facts of the larger school situation: Perhaps 
an uncertainty as to the attitude of the administration toward his 
work; a feeling of frustration, failure, and abuse because a colleague 
has received an “‘unwarranted”’ salary increase; or a feeling of futility 
over the small prospect of professional advancement. There is also 
the community situation which the teacher cannot escape and to 
which he is particularly sensitive: Limitations upon his personal 
freedom in some political, social, and economic spheres, and coercion 
in others. 

The fact that not all such overlapping situations enter the focus 
of attention during the class period does not, of course, mean that they 
are not operative. Recent work by Lewin and his students® in experi- 
mental situations and by Roethlisberger and Dixon’ in factory situa- 
tions has demonstrated that in overlapping situations where the forces 
are in conflict, behavior is modified even when one situation completely 
dominates consciousness. On the basis of such findings one could 
predict, for example,jthat a teacher’s intended enthusiasm in his class- 
room when it overlaps with a larger school situation where disappoint- 
ment and frustration are dominant, and with a community situation 
where coercion and insecurity are important factors, would be con- 
siderably tempered and perhaps become totally ineffective. The 
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extent of the modification would depend upon the relative potencies 
of the overlapping situations and the constellation of their forces. 
The importance of these aspects of the immediate situation for 
- the improvement of educational practice is this: So long as the situa- 
tions that overlap with the classroom are not taken into consideration 
and made to support or, at least, made not to conflict with the good 
intentions of the teacher, his practices will be more or less at variance 
| with his intentions, since conscious intentions constitute only a part, 
- and often a minor part, of the total constellation of forces acting 
3 upon him. 
The experience of the Western Electric Company in Chicago in 
’ dealing with these kinds of problems in the work situation is pertinent 
in this connection.’ A staff of interviewers, who partly by giving 
y opportunities for emotional expression, partly by promoting cognitive 
r insight, and partly by securing adjustments in the work situations, 
; have been able with remarkable success to resolve the conflicts of the 
overlapping work, factory, family, and community situations. These 
aspects of the workers’ situation have been found to have a much 
greater influence upon productivity than the physical conditions of 
work. Such procedures have very great possibilities in the schools 
as a means of freeing the teacher to follow his best intentions in the 
classroom. 
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THE ACADEMIC STATUS HIERARCHY 


Difficulties of communication exist not only between psychologists 
and student-teachers, but also between psychologists and their col- 
leagues of the education departments within the colleges. The 
situation in great numbers of institutions is such that there is often 
8 not only lack of professional interchange, but in many cases hostility 
y between psychologists and educators. Many education departments 
- offer their own instruction in psychology, often duplicating courses 
\- given in the psychology departments. Even within such education 
3 departments the channels of communication between psychologists, 
educators, and training-school teachers are likely to be restricted, 
with the consequence that there is little integration between psy- 
chological principles and teaching practices. 

An important source of this state of affairs is undoubtedly the 
status of education and psychology in the hierarchy of disciplines in 
\- the colleges. That a status hierarchy does exist is evident from 
e even a casual acquaintance with college institutions. In general, 
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the long-established, pure sciences and the liberal arts are near 
the top. Astronomy, physics, chemistry, mathematics, history, 
are full of prestige, they set the pattern of educational behavior for 
the other disciplines. Near the bottom of the hierarchy are the 
applied sciences and vocational subjects. Commerce, agriculture, 
hygiene, journalism, are in lower classes of the academic world— 
and amongst the lowest of these is education. There is probably 
some variation in the stratification from institution to institution, 
though without doubt a very general agreement exists. 

Such hierarchies of status are recognized as characteristic of many 
social groups; they are in no way peculiar to academic institutions.*” 
Recent research has revealed something of the significance of such 
class structuring for the behavior of the persons within the classes.‘ 
The effect of mobility upon behavior is of especial importance in the 
present connection. Individuals or groups of individuals are mobile 
when they are in the process of changing their class identification to 
an upper or to a lower class. While this change of class identification 
is in process, the individual is in a very insecure, overlapping position. 
If he is upwardly mobile he is trying to throw off his attachments with 
the class below him and to strengthen his connections with the one 
above him. This process has been studied in connection with such 
diverse hierarchies as color stratification in negroes,’ race stratification 
in the Jews,® social and age stratification in many groups.’ In all of 
these cases upward mobile individuals exhibit many similarities in 
behavior. Three of these are of importance here: (1) Antagonism 
toward lower class individuals with whom identification is possible; 
(2) antagonism toward higher class individuals who will not admit 
equality of status; (3) emphasis upon the symbols of class, 7.e., exag- 
geration of upper class symbols and suppression of lower class 
symbols. 

As a relative newcomer amongst the sciences the status of psy- 
chology in the hierarchy is as yet indefinite. However, it is certainly 
neither in the lowest nor in the highest brackets.'' That the generality 
of psychologists are desirous of establishing a high academic status for 
their science scems inevitable. That they exhibit the same behavior as 
mobile persons in other class hierarchies seems inevitable also. At least 
such behavior is easily observed. The antagonism of psychologists to 
education and educators appears to be essentially the same kind of 
behavior as the antipathy of the “‘light’’ Negro for the dark, of adoles- 
cents for younger sibs, of the liberal Jew for the orthodox, of higher 





Difficulties of Communication between Educators and Psychologists 423 


class persons for their lower class relatives. Such behavior is, of course, 
fully rationalized in all of these cases and its existence is frequently 
vehemently denied. The fact remains that in general psychologists 
lose academic status by fraternizing on equal professional terms with 
educators, and in such a case cannot be expected to do so. On the 
other hand, by emphasizing their ties with such upper class disciplines 
as mathematics, physics, and physiology, psychologists improve their 
status, and they cannot be expected to do otherwise. The strength of 
these forces toward academic acceptability in keeping individuals in 
low-paying jobs and working on problems of little potential significance 
is frequently obvious. That they should cause aloofness toward edu- 
cators is in no way remarkable. 

Psychologists have other antagonisms that contrast dramatically 
with their antagonism toward education. Medicine is undoubtedly 
above psychology in the professional hierarchy. However, there is 
much antagonism expressed by clinical psychologists toward psy- 
chiatrists. Superficially this is not unlike the criticisms expressed of 
educators. However, in this case the behavior appears to be on a par 
with the resentment of the ambitious brown Negro against the high 
yellow, or the contempt of the aspiring middle class person for the 
decadent 400. A psychologist’s professional status is improved if he is 
accepted as an equal in a medical group; however, this infrequently 
happens; usually he is included only as technician or an assistant. To 
be excluded by those of higher-status professional disciplines is as 
adequate a reason for resentment as to be excluded from the tables of 
the best families. 

The position of educators in the academic hierarchy is low, but 
educators are also mobile and react to the aloofness of higher class 
psychologists in the same way that psychologists react to the coolness 
of psychiatrists. There is this difference, however: The status of 
medical people is so secure that contact with lower class psychologists is 
not dangerous. Defensive reactions are therefore not necessitated. 
In the case of educators and psychologists, however, the insecurity of 
both gives rise to mutual defensive reactions that tend progressively to 
magnify the original schism. 

This does not mean that the criticisms of psychologists of educators 
and vice versa are all groundless rationalizations; many are justified. 
However, professional hierarchical positions may dominate the personal 
relations of the individuals involved, provide a reason for nursing 
legitimate grievances, contribute to the failure to compromise disagree- 
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ments, and, in consequence, operate as a causative factor for the poor 
communication between psychologists and educators in colleges. 


DIFFERENCES IN EMPHASIS 


Another reason for the communication difficulties of psychologists 
and educators may be mentioned. It appears to lie in the fact that 
educators and academic psychologists are very largely concerned with 
different phenomena, although they have frequently failed to recognize 
this. 

Behavior is a continuum from birth to death, and to study behavior 
it is necessary arbitrarily to select particular segments for analysis. 
These may be as short as reaction times or as long as biographies. 
The longer segments of behavior contain the shorter, but they cannot 
be explained in terms of the shorter, or in terms of the concepts that are 
adequate to explain the shorter segments. 

A teacher who catches a glimpse of a pupil on the stairs and asks, 
“What is that child doing?,’”’ might receive several true answers: 
Namely, (1) flexing certain muscles and relaxing others; (2) stepping 
downward from the fourth to the fifth step; (3) going downstairs; (4) 
leaving the school building; (5) running away from school. It is obvi- 
ous that a curve representing the teacher’s degree of interest in these 
replies would rise from answer (1) to answer (5). In other words, 
the teacher is more interested in actions (achievement of effects) 
than in actones (mechanisms of achievement), to use the terms 
employed by Murray,‘ or more in molar than in molecular behavior, to 
use the terms of Tolman.’ On the other hand, a curve representing the 
interest of the generality of academic: psychologists would descend 
from answer (1) to answer (5). This is probably true in large part 
because the competence of psychologists is much greater in the field 
of actone than in that of actions. The triumphs of experimental 
psychology have been achieved in studies of sensation, perception, 
reflex action, conditioning, etc. Necessarily, much of the instruc- 
tion in psychology must, be devoted to such molecular behavior. It is 
true and it is important; but it does not provide a basis for an under- 
standing of the molar behavior with which teachers must be largely 
concerned. Why this is true is a technical matter that cannot be con- 
sidered here, but it may be illustrated by an incident from other fields. 

A roving reporter in looking for a story had arrived at the railway 
station as a heavy freight train slowly passed through the yards. He 
approached a spectator who turned out to be an engineer and asked him 
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to explain briefly and simply why the train was moving; he received this 
reply: “‘ Because the force of the steam against the pistons is greater 
than that of the resistance of the opposing forces of gravity and fric- 
tion; that is why the train is moving along the tracks.”” The reporter 
next turned to a spectator who was an economist and propounded the 
game question. In this case he received this reply: ‘“‘That train is 
loaded with wheat; there’s a great demand and a good price for wheat in 
Chicago at the present time, that is why the train is moving down 
the track.’”’ It is obvious that these two true explanations of different 
segments of the same phenomenon are in no way interchangeable, that 
the engineer will receive no help from the economist’s explanation and 
vice versa. In terms of this analogy, it may be said that in the past 
psychologists have very largely given educators engineering answers to 
economic problems. Until this is changed the usefulness of psychology 
to education will remain limited. 

That many psychologists desire to do something about this is evi- 
dent from the frequent attempts made to integrate psychology with life 
and with education. One can commend this move without being una- 
ware of the tendency to become merely popular and to present literary 
psychology in such books andcourses. Explanations of molar behavior 
to be adequate must be just as strictly scientific as explanations of 
molecular behavior. Until psychologists can deal rigorously with the 
behavior with which educators must be concerned it is inevitable that 
educators will refuse to take the suggestions of psychologists as seri- 
ously as they should be taken. 

There are other less basic reasons for the misunderstandings between 
academic psychologists and teachers. One of these that may deserve 
mention here is the confusion of viewpoints and terminologies that 
abounds in psychology today. It is understandable why the educator 
who does not have the time to reconcile one viewpoint with another in 
his own thinking is likely to cast a plague on all the houses of psychology 
after attending several courses or reading several books. This state of 
affairs is a product of the rapidly developing state of the science at the 
present time, and little can be done about it. However, an effort can 
be made to secure consistency in the brand of psychology taught in 
any sequence of practical education courses. There are undoubtedly 
several schools of psychology that have practical usefulness for educa- 
tion if followed consistently. However, the jumble of superficial 
viewpoints with which teachers are likely to be presented under the 
banner of eclecticism is likely to be of limited value. 
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The hypotheses which have been proposed to account for com- 
munication difficulties between psychologists and educators are based 
upon casual observation and are stated in terms particular to the 
instance considered. This consideration may serve, however, to indi- 
cate that here is a fertile field for conceptual clarification and experi- 
mentation upon a social problem of great practical and theoretical 


importance. 
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A REVIEW OF RECENT STUDIES 
ON MUSICAL APTITUDE* 


SYLVIA F. BIENSTOCK 
Jackson Heights, New York 


This review undertakes to survey the studies of musical aptitude 
published between 1934 and 1940 inclusive. A few earlier studies not 
covered in Farnsworth’s'* 1934 review are also included. Studies 
pertaining to tests of musical aptitude and their prognostic value, which 
have been published in musical or in psychological journals, are few. 
Reporting on a survey of over one hundred investigations just com- 
pleted or in progress in 1936, Bienstock* found less than five on music 
prognosis. In the past two or three years there has been somewhat 
more activity in this field. 


THE SEASHORE MEASURES OF MUSICAL TALENT 


All of the investigations reviewed used the original Seashore*! 
measures since the revision of these tests did not appear until late in 
1939. With the appearance of the new edition the figures on the 
reliability and validity of the older measures are more of academic 
interest than of practical importance. Mursell*! reviewed the litera- 
ture on the reliability and validity in 1937, including the work of Lar- 
son?® and the extensive reports of Stanton.** He found the reliability 
of even ‘“‘the two best tests, Pitch and Tonal Memory, significantly 
smaller than those yielded by our most satisfactory and accurate group 
intelligence tests. . . . ”’ Mursell concluded his intensive review of 
Stanton’s work, which he considered the most ambitious report on the 
validity of the Seashore tests, as follows: 


One thing is abundantly clear. We have here nothing in the way of a 
direct validation of the Seashore Measures of Musical Talent. The results 
are evidently of considerable practical value. But they are based on factors 
elaborately combined and nowhere analyzed in isolation from each other. 
They furnish no proof whatever that the Seashore tests given independently 
of any other measures will yield a valid index of musical capacity. 


These conclusions do not necessarily apply to the newer edition** 
of the tests. Experimental evidence on the reliability and validity of 
the 1939 revised measures are presented by Saetveit.* There is a large 





* The writer is grateful to Dr. Albert J. Harris, of the College of the City of 
New York, for his critical review of this manuscript. 
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task to be undertaken by future investigators in the further validation 
of these measures. 


THE KWALWASSER-DYKEMA TEST OF MUSICAL ABILITY 


The most comprehensive review of research relative to the reliabil- 
ity of the Kwalwasser-Dykema Test of Musical Ability?!* has been 
reported by Farnsworth.’ He concluded his summary of the most 
important investigations conducted prior to 1934 with the statement 


that: 


The only possible conclusion to be made on the subject of reliabilities would 
seem to be that, with the possible exception of the test of Tonal-Movement, the 
Kwalwasser-Dykema Tests are too unreliable for individual prognostication. 


Manzer and Marowitz,* in 1935, reported the following retest 
coefficients when one hundred and one college sophomores and juniors 
were given the K-D tests: Tonal Memory .73 + .03; Quality Dis- 
crimination .32 + .06; Intensity Discrimination .05 + .07; Time 
Discrimination .43 + .06; and Rhythm Discrimination .48 + .05. 
They concluded with the reminder that these tests ‘‘should be used for 
individual diagnosis with great caution.” 

In 1938, Wiener®® administered the K-D tests, after an interval 
of one year, to one hundred students of the High School of Music and 
Art in New York City. The correlations were of the following 
magnitudes: Tonal Memory .56 + .06; Quality Discrimination 
.40 + .07; Intensity Discrimination .21 + .08; Rhythm Discrimination 
.08 + .09; and Time Discrimination .50 + .06. Since these subjects 
were supposed to represent a highly selected group and many of them 
attained the maximum or nearly maximum score on each of the tests, 
an effort was made to study the effect of skewness. The reliability 
of Test I, the Tonal Memory Test, was recalculated with all maximum 
or nearly maximum scores eliminated. The results indicated that: 
‘* |. . eliminating the factor of skewness which might hinder reliabil- 
ity does not seem to increase reliability of Test I but, in fact, decreases 
it.” Utilizing eighty high-school students, many of whom were 
also included in the Wiener study, Bienstock® reported these values: 
Tonal Memory .52 + .09; Quality .45 + .10; Intensity .35 + .10; 
Time .00 + .10; and Rhythm .39 + .10. 

All of the investigators obtained their highest coefficients for the 
test of Tonal Memory. While these values were positive, they were 





* This battery of tests will hereafter be referred to as the K-D tests. 
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nevertheless too low to be considered reliable measures for the predic- 
tion of individual musical ability. The reliabilities for the Rhythm 
test were consistently lower than those for Tonal Memory. However, 
when scores on this test were correlated with grades in music courses 
Bienstock® found a value that was reliably greater than zero. She 
concluded ‘“‘that a test of this sort, improved to the point where it 
would have satisfactory reliability, might be of really practical value in 
musical guidance.”’ 

Farnsworth’s'* summary also included investigations pertaining to 
the validity* of the K-D tests. This battery was correlated with 
grades from courses in which various types of skill in musical perform- 
ance were stressed. The values ranged from negative coefficients to a 
maximum of .40. Farnsworth concluded that ‘insufficient data are at 
hand relative to the matter of validity.”’ Bienstock® correlated five 
tests of the K-D battery (Tonal Memory, Quality, Intensity, Rhythm, 
Time, and a total of all five) with marks earned by eighty students in 
music theory, instrumental or voice instruction, orchestral or choral 
practice, and music regents marks. These values, which ranged from 
—.15 to .43 with an average of .14, were in close accord with those of 
previous investigators. 


EFFECT OF TRAINING UPON MUSIC TALENT TESTS 


The few studies which considered the effect of training upon K-D 
scores are contradictory. Farnsworth'*—quoting from Chadwick,’ 
whose subjects were sixty-seven music major students at Colorado 
State Teachers College, and from Barnard,' whose subjects comprised 
grade-school children—found a positive relationship between training 
and K-D score. Those students who had received training earned 
higher scores on the K-D tests than those who received no training. 

Kwailwasser?? compared the mean scores of junior-high-school 
students who had received no music training outside of school with 
those who received ten or more private music lessons. The mean 
for the untrained group on the K-D battery was 176.25, while for the 
trained group it was 187.50. The difference appeared to be statistically 
significant. Gilbert!* administered the K-D tests to one thousand col- 
lege students in five states. He found the women earned higher scores 
than the men. However, the women had also received more training, 


* The author was unable to secure any literature in which the initial validation 
of these tests was reported. 
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and when only the untrained groups were considered, the sex differences 
disappeared. 

Wiener, Bienstock, and Drake reported results somewhat at vari- 
ance with the above. Wiener®® obtained an extremely low coefficient 
when the scores of students were correlated with retest scores on the 
K-D tests after having received one year of intensive training both in 
and out of school. There appeared to be no relationship between 
practice and improvement in test scores. Bienstock‘ observed a slight 
gain on K-D scores after one year of training, but this gain was not 
statistically reliable. Drake’ utilized only the K-D Tonal Memory 
item in his battery, which included, also, the Seashore Measure of 
Pitch and the Drake Musical Memory Test. He reported: “The 
author has found that after several years of almost daily musical train- 
ing, there is no more improvement than that accounted for by matura- 
tion as provided for in the age norms.”’ 

Tomaoka® tested five hundred nineteen middle-school, high-school, 
and a few university students on the six Seashore tests and his conclu- 
sion agreed with the latter group of experimenters; namely, that the 
effect of training was not marked. Semeonoff* designed an entirely 
different type of test for his students, who were members of the School 
of Arts and Music. Each subject was asked to state whether he liked 
each of a series of ten gramophone records and to select the correct 
interpretation from a group of four alternatives. He decided that 
ability to interpret music is “relatively independent of training and 
experience.”’ In a later study which included also tests of Musical 
Taste and Musical Knowledge Semeonoff* tested two hundred seventy 
boys and girls ranging in age from twelve to fifteen years. He con- 
sidered ‘‘significant differences on most measures were found between 
children who played an instrument and those who did not.’”’ He said, 
however, ‘‘a variety of difficulties, such as knowing how to deal with 
performers on the Jew’s harp and kazoo and the heterogeneity of the 
sample, make this criterion one of doubtful value in the present case, 
but the observed differences are highly suggestive.’’ He summarized 
his findings in this statement; ‘‘ No conclusive evidence, however, was 
obtained regarding the extent to which the scores were affected by 
differences in education.’ 

The deciding factor on this issue may be a consideration of the 
amount of training which the subjects received. As a rule, where 
the training was brief the effect upon the tests was more marked than 
where the training was extended over a longer period. Only a con- 
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trolled study covering a prolonged training period can adequately 
answer this question. 


RELATIONSHIP BETWEEN MUSIC TALENT AND INTELLIGENCE 


Farnsworth" in 1934 found only three studies which reported on the 
relationship between intelligence and the K-D tests. The coefficients 
ranged from .03 to .26 when scores on the K-D battery were correlated 
with the intelligence test ratings of college students. Since Farns- 
worth’s report many more results have been published and a divergent 
viewpoint has been noted. 

Using future teachers enrolled in a fine arts college, Kwalwasser?? 
obtained a coefficient of .03 between the K-D battery and the Thur- 
stone Psychological Examination for High-school and College Fresh- 
men. The only study in which high-school students served as subjects 
was that of Bienstock.* The scores of five of the K-D tests were cor- 
related with the intelligence quotients obtained from either the Terman 
Group Test of Mental Ability or the Otis Self-Administering Test of 
Mental Ability. These values ranged from —.10 to .29 for the indi- 
vidual K-D tests, and .22 for the average of all five tests. Several 
investigators have had recourse to junior-high-school students. Wise*’ 
revealed a correlation of .42 for the Otis Classification Test and all ten 
of the K-D tests. Kwalwasser?* administered the same batteries to 
seven hundred junior-high-school pupils with a resultant r of .34. 
Drake’s'? subjects—one hundred sixty-three English boys with an 
average age of thirteen years—were given two of the K-D tests along 
with a battery of several other music tests and an unspecified intelli- 
gence tests. His r’s were all below .27. Drake assumed that earlier 
studies which reported higher correlation values suffered from the use of 
bad tools, “from halo effects,’’ or from ‘‘spurious correlations.’’ He 
concluded that when relatively pure measures of musical talent are 
used there is no significant relationship between musical ability and 
intelligence. 

Mursell*? found a division of opinion in his recent résumé. The 
American studies almost uniformly reported very little relationship 
between intelligence and musical talent (generally measured by the 
Seashore tests), while the European studies consistently indicated a 
close relationship between intelligence and musicality (generally meas- 
ured by functional criteria). Several more recent American studies 
have upheld the latter viewpoint. Schoen“ explained that we have 
evidence that in intelligence the musically talented person ranks above 
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the average. Stanton and Ross used the Seashore tests for their 
results. Stanton*® asserted that students with profiles indicating 
high talent had for the most part higher comprehension scores than 
those with profiles indicating low talent. Ross® tested fifteen hundred 
forty-one California public school children, grades V through XII, on 
the Terman Group Test of Mental Ability as well as on the six Seashore 
tests with r’s of .06 to.25. Although these values were not high enough 
to be of predictive value, Ross found that those pupils who possessed 
musical talent sufficient to classify them as of superior musical ability 
were found to be superior to their general population in intelligence. 

Dykema, Bienstock, and Beckham used the K-D tests and pre- 
sented results which suggested a similar trend. Dykema!® adminis- 
tered the battery to fifty-eight hundred forty European children aged 
nine through eighteen, and concluded: 


There is a decided tendency for brighter children, that is to say those 
who are younger than the predominating age of the grade, to rank higher 
than the normal children of their age, and sometimes, even to surpass those 
who are older. 


Bienstock‘ noted that students who ranked highest on intelligence 
were also among the highest on K-D test scores and, similarly, the 
lowest 1Q’s were accompanied by low K-D scores. In Beckham’s? 
study of intellectually superior Negro children he found the number 
of superior children above the fiftieth percentile on the K-D battery to 
be greater than the number of non-superior children. 

It is clear that at present we do not have the answer to this question. 
The difficulty probably lies in the inadequacy of our tests for measuring 
musical ability. When correlations for intelligence and musical ability 
were reported, large numbers of subjects were involved, and the 
resultant coefficients were all positive though low, thus indicating some 
relationship. When the ability of individual subjects were considered 
it seemed apparent that a high degree of music talent was usually 
accompanied by superior intelligence and, similarly, a low degree of 
music talent was often associated with inferior intelligence. 


THE RELATIVE VALUE OF INTELLIGENCE AND MUSIC CAPACITY TESTS FOB 
PREDICTING SUCCESS 


To date we have no one measure or test which predicts success in 
musical endeavor with a high degree of accuracy. In this, music 
parallels other subjects for which we have no one test but instead 4 
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battery of several measuring devices which often prove helpful. Many 
investigators have attempted to measure a few of the many aspects of 
musical ability. It seems likely that a composite of several of these 
measurements may prove to be the reliable test of musical ability which 
is at present eagerly sought by many educators. 

Sight-singing and Ear-training.—The evidence for the prediction of 
success in sight-singing and ear-training is presented in Table I. 
Usually classes in sight-singing devoted some time to ear-training and 
vice versa. The grades utilized were an average of the final mark or 
marks received by the students at the end of a period of training rang- 
ing from one term to three years. The coefficients for the correlation 
of intelligence and sight-singing were all below .34, with the exception of 
those presented by Chadwick.’ His values, in general, however, were 
higher than those presented by the other investigators. One reason 
for this may have been the fact that he tried to use a “very objective 
performance test”’ for measuring sight-singing ability. The coeffi- 
cients for sight-singing and the Seashore tests ranged from .22 to .75, 
while those for sight-singing and IQ’s were consistently lower. The 
general trend of these results suggested that for achieving success in 
ear-training or sight-singing, a high score on the talent test, particu- 
larly on the Pitch or Tonal Memory Tests, was more important than a 
high intelligence quotient. 

Music Theory.—For success in theoretic music the findings were not 
in as close accord. One reason for this may have been the fact that 
theory in some schools included several aspects of music, such as 
ear-training and aural harmony, while in other schools theory was 
devoted almost exclusively to a study of the principles of music 
composition, history of music, and music appreciation. Theory grades 
as defined by the latter definition were utilized by various investigators 
presented in Table I. Highsmith,’ Wilson,® and Larson* correlated 
theory grades with the total of all the Seashore test scores and obtained 
coefficients of .49, .21, and .59, respectively. Highsmith and More” 
correlated music theory grades with each of the Seashore tests and 
found values from —.07 to .73 with most of the figures below .40. 
Farnsworth’s' and Bienstock’s® scores for theory included some aspects 
of aural perception. Both of these experimenters obtained low coeffi- 
cients when theory grades were correlated with scores on talent tests. 
Farnsworth made use of two hundred sixty-three students to correlate 
intelligence quotients with grades in history and appreciation of music 
with resultant r’s of .41 and .32. The same grades yielded coefficients 
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of .14 with Pitch and .16 with Tonal Memory. Highsmith and 
Bienstock obtained values of .60 and .58, respectively, when IQ’s were 
related with theory marks. Pike** administered the K-D tests to a 
group of one hundred students selected at random from the music 
education department at Temple University. He concluded that the 
general intelligence test was approximately four times as accurate as 
the K-D test for prognosticating the quality of work in music courses, 
Larson maintained that the Seashore tests have definite value for 
prognosticating success in theoretic music, while Highsmith and Bien- 
stock agreed with Pike that a high IQ was definitely more important 
than a high score on the talent tests for predicating grades in theory. 
The weight of the evidence appeared to be with Farnsworth who con- 
cluded that music capacity tests were better tools of prediction for 
courses in which tonal perception and performance was emphasized, 
while intelligence tests were better for those courses in which the more 
academic side was stressed. Intelligence tests appeared to predict 
grades in music theory about as well as they predict other school marks. 
Current literature is in fairly close agreement in placing the correlation 
for IQ and school marks between .4 and .6. 

Applied Music.—The scarcity of investigations relative to applied 
music may, perhaps, be attributed to the difficulty of testing latent 
ability as well as of obtaining reliable measurements of musical perform- 
ance (applied music). Farnsworth! discarded a section of his study 
dealing with grades in practical music (violin, piano, etc.) because he 
did not deem them sufficiently reliable. Mursell,*° on the other hand, 
asserted that grades in applied music seemed to have sufficient reliabil- 
ity to be utilized as material for validation, and Seashore‘? maintained 
that “for the first time we now have instruments adequate for the 
recording of all significant aspects of music performance by voice or 
instrument.” 

Larson*‘ considered the Seashore tests reliable measures for prog- 
nosticating success in performance upon a musical instrument. Under 
her direction tests are now used for the selection of students who 
receive from the city of Rochester free instruction upon musical 
instruments. The instruments are also furnished gratis to the students 
for the duration of their school careers. Wilson® did not believe the 
Seashore tests adequate for predicting success in applied music, since 
he found a correlation of only .21 between these two measures. 

Two investigators used both intelligence tests and the Seashore 
tests for predicting marks in applied music and found neither test 
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satisfactory for this purpose. More” correlated intelligence quotient 
with the average of all the music marks, including applied music, and 
found an r of .29 + .05, while the correlations for marks in applied 
music and each of the Seashore tests ranged from —.15 to .46. High- 
smith” found a coefficient of .42 when marks in applied music were cor- 
related with intelligence, and a coefficient of .31 when they were 
correlated with the average of all the Seashore tests. He decided that 


ability in musical performance seems either to be dependent upon traits only 
slightly touched by any of the measures used here or to be so unreliably indi- 
cated by instructors’ marks that it cannot be investigated until a more 
reliable criterion of musical performance is found. 


Bienstock,® on the other hand, maintained that “‘ with some effort and 
experimentation, the marks given by teachers for musical performance 
can be sufficiently reliable to measure individual achievement.”’ 

Ability in musical performance seemed to be only slightly related to 
the ability to comprehend music theory. Highsmith!’ found a cor- 
relation of .37 between applied music marks and music theory, while 
Bienstock® reported r’s of .25 and .13 when marks for theory were 
correlated with marks in instrumental work and orchestral perform- 
ance respectively. More” concluded that “there seems to be a 
tendency toward more accurate prediction of probable success in 
theoretical music than in applied music.” 


OTHER TESTS FOR PREDICTING SUCCESS IN MUSICAL ACHIEVEMENT 


In the search for adequate measuring tools for the prediction of 
musical success all clues should be investigated. These are important 
for their negative as well as their positive aspects. In other words, it is 
just as important to know which aspects of the problem to ignore as 
well as which ones to pursue. Diserens® suggested that if testers fail in 
part, it may encourage them to prepare better tests. He also recom- 
mended that the work of all testers be surveyed even if the results are 
meagre, with the hope that ‘‘such surveys may induce the musician to 
lend his aid in such a worthy undertaking.’”’ In considering the talent 
tests available at present Tamaoka™ agreed with the results of many 
other investigators in finding the Seashore Pitch and Tonal Memory 
Tests the two most important and fundamental criteria in the test of 
musical talent. Of the K-D tests Pawlowski*®* reported tests II, III, 
V, and VII to be of diagnostic value. Another test which might prove 
helpful is that devised by Ortmann.** The validity of this test was 
studied over a ten-year period with over five hundred students and it 
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was reported to be higher than that of the Seashore measures. At 
present this test can only be given at the Peabody laboratory, though 
further study might make it available to larger groups. Mursell? 
believed that “‘what the field urgently needs is research of a more 
practical kind, in contact with actual learning and teaching situations.” 
He suggested the use of apparatus such as the phonophotograph and 
the Ampico recording piano as described by Tiffin.5?5* Jersild' 
recommended the construction of “tests of awareness of tonal-rhythmic 
configurations or emotional responsiveness thereto.’’ It is possible 
that the problem of testing musical ability might be considerably 
simplified if we had more reliable measures of achievement. Seashore*® 
maintained ‘‘there is a wide central area of achievement that can be 
measured by paper and pencil tests for individuals or groups.”’ He 
offered rules for the construction of a sight-reading scale which were 
suggested by his ‘‘ happy experience in the use of the Knuth Achieve- 
ment Test in Music.?°” 

Some investigators have experimented with other types of measures 
which were included in Table I. The amount of musical training 
received yielded positive though low coefficients when correlated with 
grades in ear-training, theory, or applied music. Performance tests of 
various types have been devised. These have often included detailed 
sight-singing or ear-training exercises as well as playing and singing of 
prepared compositions. Wright, Salisbury and Smith,*® and Wilson® 
found a marked relationship between their music performance tests and 
marks in sight-singing and ear-training. 

The literature also contained reports of a few other tests. Taylor®! 
suggested that it was inadvisable to predict success in music on the basis 
of hand formation. Lamp and Keys** concluded that anatomical 
formation, measurements such as length or slenderness of fingers, thick- 
ness of lips or evenness of teeth, have no appreciable relationship to 
performance upon the clarinet, the violin, or the brass horn. A com- 
bination of these measures may have some predictive value for the brass 
horn. They also reported that the Seashore tests did not yield an 
index of aptitude which was adequate for the individual guidance of 
students desiring to study brass, woodwind, or stringed instruments. 
Further experimentation with tests of dexterity and motor coérdination 
for predicting success in acquiring instrumental technique should prove 
helpful. Salisbury and Smith*® strongly suggested that ‘“‘measures of 
factual knowledge about music notation proved of small prognostic 
value in contrast with the two Seashore tests and in contrast with 
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measures which involved auditory perception or imagery as well as 
knowledge of notation.”” The present status of guidance materials 
was best summarized by Seashore** who maintained that while “‘every 
type of information of the most authentic sort that will apply to a 
clarification of the next turn in the course” should be utilized, the guid- 
ance program “‘in a sense should always be of a negative and protective 
character.”” While our measures may be statistically reliable, they 
may not always be adequate for predicting individual achievement. 

The form our final instrument for measuring musical aptitude will 
assume is still uncertain. Mursell*! considered that it ‘ will be largely 
in the form of a set of tests given individually . . . no conceivable 
group procedure is suitable for dealing with such presumably important 
integrative functions as melodic production and reproduction and 
rhythmic production.”’ An interesting report on the effect of individual 
versus group administration of the same test was that of Capurso.6 He 
retested ninety-five children, aged seven to fourteen years, on the Sea- 
shore Pitch and Intensity tests and also obtained careful ratings from 
their teachers on the emotional adjustment of each child. While he 
could not conclude from his experimental evidence (since only the retest 
was given individually) that “‘being alone’ when taking the Seashore 
test will lower the score for the maladjusted individual and raise the 
score for the adjusted one,”’ that, however, was his opinion. 

Recently, new interest has been displayed in the theoretic basis of 
music testing. Lowery”? summarized the two current theories. He 
described the integrative theory of Drake and Seashore and the omni- 
bus theory upheld by Mursell. The integrative theory he designated 
as “an attempt to analyze musical talent into factors or parts’’ while 
the omnibus theory embraces the Gestalist principles. Lowery believed 
there was a place for both methods of testing. The integrative tests 
may be used for the preliminary selection of ability, while tests con- 
structed along the omnibus lines should be applied to the more advanced 
stages of musical activity. 

In considering the many factors which constitute musical talent 
Drake,'! by means of the Spearman tetrad-difference technique, 
revealed there are five, and possibly more, separate abilities consti- 
tuting musical talent. He recommended that the two capacities, pitch 
discrimination and auditory memory, are distinct and fundamental for 
the musical mind and these should always be included in any music 
testing program. Karlin” used Thurstone’s multiple-factor analysis 
technique to analyze two different batteries of music tests and he 
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reported that there is a “stability of possible music factors.”’ This 
study is still in progress. 

The status of testing and guidance in music is beginning to emerge 
as a subject worthy of intensive effort by both psychologists and musi- 
cians. The results, however, are far from conclusive at the present 
time. 
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During the examination of approximately six thousand selective 
service registrants from the cities of New Orleans, La.; Mobile, Ala.; 
and Pensacola, Fla.; and from the rural sections adjacent to these 
cities, the problem of how to deal objectively with those registrants 
who claimed illiteracy presented itself. In the examination of such 
individuals the following specific problems arise: 

(1) How to gain assurance, by objective measurement, that the 
subject though illiterate, is not mentally deficient. 

(2) How to identify conclusively the mental defective in a com- 
paratively brief period of time. 

(3) How to distinguish the malingerer from the true illiterate or 
mental defective. 

This preliminary study was undertaken to determine if it were 
possible to devise an objective measure by which the interviewer could 
be assured that, though the individual was an illiterate, he was not 
mentally defective. There is an urgent need at the present time for 
such a measure. The examiner must be able to determine quickly 
where the illiterate candidate will make a suitable recruit. Most 
tests measuring mental ability are verbal (written) and, therefore, 
are useless with those claiming illiteracy. The non-verbal and per- 
formance tests now available are too time-consuming or require too 
elaborate apparatus. A short test which can be administered without 
the use of apparatus and in the space of a few minutes’ time is highly 
desirable. Such a test should not attempt to classify the subject 
as to his particular mental age but should indicate whether the regis- 
trant has sufficient mental ability to be passed or whether further 
psychometric examination is advisable. 

In this study any individual who had a mental age of nine years 
or less as measured by the Kent Emergency Test (E-G-Y) was con- 
sidered mentally deficient. With the group of illiterates whose mental 


age falls within the range of ten to eleven years, schooling is an impor- 
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tant factor. Obviously lack of or insufficient schooling, in such cases, 
does not necessarily indicate mental deficiency. Neither does it 
indicate an individual’s lack of ability to become, in a short time, 
sufficiently literate to qualify as a soldier. There is a probability 
that individuals whose mental age is higher than eleven years may not 
be illiterate regardless of a lack of schooling inasmuch as their greater 
amount of intellectual ability tends to allow them to learn to read 
and write simple material spontaneously. According to Terman’s 
classification of mental ability as measured by the Intelligence Quo- 
tient,! any individual with an IQ of 70 or less may be considered to 
be definitely feebleminded. In an adult, an IQ of 70 indicates a 
mental age of approximately ten years. Therefore a test that is 
pointed to a ten-year mental age scale should serve to differentiate 
the mentally defective illiterate from the illiterate with normal or 
low normal intelligence. The examination of registrants was partially 
completed before the problem outlined above manifested itself suffi- 
ciently to indicate the need for this study. Eighty-eight of the regis- 
trants who claimed illiteracy were consequently referred to the inter- 
viewer. These registrants presented no psychiatric problem other 
than that of determining whether they possessed sufficient mental 
capacity to be capable of making satisfactory adjustment to army 
requirements. 

Their chronological ages ranged from twenty to thirty-six years. 
The amount of schooling ranged from none to third grade. Only one 
registrant is listed as having attained the fourth grade. Occupations 
included farmers, mill workers, truck drivers, and isolated cases of 
butler, porter, painter, etc. Fifty-three or approximately sixty per 
cent were white; the remainder, colored. All claimed they could 
neither read nor write. 

The series of questions used by the interviewer were presented 
to the registrant after preliminary remarks had served to make him 
as comfortable as possible. As, in the use of any test, much of the 
success of its administration depends upon the examiner. He is in 
direct contact with the subject. He can note by observation the 
speed and manner with which answers are given. He is also in 4 
position to evaluate the quality of those answers. These questions 
were never presented in a mechanical manner but in a conversational 





1 Terman, L. M.: The Measurement of Intelligence, Boston: Houghton Mifflin 
Co., 1916. 
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tone. Whenever there was a question about the response, the subject 
was given the benefit of the doubt. 


TaBLE I.—CHRONOLOGICAL AGES TaBLeE II.—ScHOOLING 
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Many questions which do not appear in the final series were tried 
and evaluated in preliminary study. Examination of results of 
successful and failing answers to those questions which have been 
eliminated showed that they did not substantially affect the final 
rating. The series of questions presented to the registrants follows: 

(1) When were you born? 

(2) How many days are there in a week? 

(3) How many months in a year? 

(4) What is the third (or fourth) month? 
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(5) If you buy a package of cigarettes for fifteen cents and gave the 
man a dollar, how much change will you receive? 

(6) If you bought something for eighty cents and gave the man a 
ten-dollar bill, how much change would you receive? 

Individuals who are mentally defective differ from those of normal 
intelligence particulary in power of comprehension, ability to direct 
thought, amount of information possessed and spontaneity of atten- 
tion. The series of questions was formulated with these abilities 
in mind and attempts to include tests for as many as possible in so 
brief a series. Questions 2, 3, and 4 test the time orientation of the 
individual. In the mentally inferior, the mental associations are 
weaker and less numerous than in the normal, making this test difficult 
for them. A defective intelligence on the nine-year-level can be 
expected to do about as well as the normal child with a ten-year mental 
age. Inferior responses to these questions, other things being equal, 
are usually indicative of inferior intelligence. Questions 5 and 6 are 
the most important of the series. Making change tests several 
abilities of the individual. The arithmetical computation in the 
problems is not difficult and knowledge of the process involved is 
possessed by nearly all, regardless of a lack of schooling, who are not 
feeble-minded. The first step for the subject is comprehension of the 
problem. The fundamental operation which applies to the situation 
must then be selected. This step is particularly perplexing to the 
mental defective. Thus, these questions give an indication of the ini- 
tiative, the judgment, and the power to reason possessed by the subject 
being tested. 

No numerical values are given to the individual answers. The 
final score is merely passed or failed. Any individual who can answer 
questions 5 and 6 accurately and one or more of the first four questions 
satisfactorily may be considered passed. Any individual who cannot 
answer question 5 or 6 satisfactorily and fails on any of the preceding 
questions has failed. Immediately succeeding the presentation of 
the series, the Kent Emergency Test E-G-Y? was administered to the 
subject. This test of mental ability was chosen because it was pri- 
marily designed to be given where a quick measure of intelligence is 
wanted. It was also devised for use with adults. Results of previous 





1 Terman, L. M.: op. cit. 
2 Kent, Grace H.: Kent Emergency Test E-G-Y, Mental Measurement Mono- 


graphs, January, 1932, No. 9. 
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use of this test show a high correlation with the Stanford Revision 
of the Binet-Simon scale. 
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From a study of Table IV it may be observed that: 

(1) Of thirty-four white subjects scoring a mental age of ten years 
or more, twenty-four passed the series and ten failed. 

(2) Of eighteen Negro subjects scoring a mental age of ten years 
or more, sixteen passed and two failed. 

(3) Of the total of fifty-two subjects with a mental age of ten years 
or better, forty passed and twelve failed. 

(4) Of nineteen white subjects who scored a mental age of nine 
years or less, all nineteen failed the series. 

(5) Of seventeen Negro subjects scoring a mental age of nine years 
or less, fourteen failed while three passed the series. 

(6) Of the total of thirty-six cases with a mental age of nine years 
or less, thirty-three failed the series and three passed. 

Summing up, it may be seen that the series of questions was 
passed by forty subjects who scored a ten-year or better mental age 
on the Kent scale and failed by thirty-three with a mental age of nine 
years or less. Thus, in seventy-three cases or 82.9% of the total the 
series indicated whether the individual possessed sufficient mental 
ability to make satisfactory adjustment or whether he was mentally 
deficient. 

Since this is but a preliminary study, only tentative conclusions 
may be suggested. The results seem to indicate, 
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(1) That an objective measure of mental ability to be administered 
in a brief period of time can be found. 

(2) That such a measure passed by an illiterate subject could 
definitely eliminate him from further mental testing at the time of 
the examination. 

(3) That further study on such a series of questions would bring 
worth-while results. 

In those cases where the individual fails the series, the judgment 
of the examiner would determine whether a further mental examina- 
tion was advisable. In many instances, the speed of reaction and 
the quality of the answers indicate clearly that the registrants are 
of marked inferior intelligence. In any case where there is a doubt 
or a suspicion that the subject is malingering, a further and more 
extensive examination is indicated. It is believed that a follow-up 
with a much larger number of cases and with a possible refinement in 
the list of questions would ultimately result in an objective measure 
for determining the mental capacity of the illiterate registrant that 
would be an aid to the interviewer. 

It may be well to note that the number of malingerers found during 
the examination was negligible and does not appear in the tables. 
Such cases were easily identified because of the manner in which they 
answered questions. Any individual who answers questions 1, 2, and 
3 in a negative manner and then unhesitatingly answers question 4 
and either 5 or 6 correctly, gives evidence of malingering. 

This study is not offered as a panacea afforded the neuropsy- 
chiatrist who, in examining registrants, must differentiate the illiterate 
non-defective from the illiterate defective, and both of these from the 
poorly educated individual who thinks that failing to codperate in a 
literacy test may be an easy way to postpone or even avoid service 
in the army. Although we have too few cases to draw scientific con- 
clusions, the trend indicated by the study is important and the study 
itself is sufficiently timely to warrant drawing attention to the basic 
maneuver, which is simply to formulate a very short objective test of 
intelligence pointed toward the ten-year level. This mental age 
level was chosen since, at the time of testing, it was used as the mini- 
mum level for acceptance of men for service in the army of the United 


States. 








THE COMPARATIVE EFFECTIVENESS OF CERTAIN 
STUDY TECHNIQUES IN THE FIELD OF HISTORY 
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Training students to study effectively is a function assumed by 
teachers and administrators alike. In recent years their task has been 
simplified by the publication of pamphlets, manuals, and textbooks 
that show students how to study. Despite the attention paid to 
teaching effective learning habits, very little evidence is available to 
suggest the relative effectiveness of the several techniques recom- 
mended. ‘To ascertain, under certain pre-determined conditions, the 
relative merits of four study techniques was the purpose of this 
investigation. 


STATEMENT OF THE PROBLEM 


This study was devoted to comparing the effectiveness of four 
study techniques as applied by college students to historical textbook 
materials. The four techniques compared were: Repetitive reading, 
involving no writing whatsoever; underscoring and marginal note- 
making, consisting of writing on the text page, but not outlining the 
material; outlining, a topical listing of important items in their proper 
relationships; and précis-writing, a brief summation of material 
studied. 

The subjects of the investigation comprised most of the freshman 
and sophomore students enrolled at a four-year liberal arts college, 
Upsala College, East Orange, New Jersey, during the Autumn semes- 
ter of the academic year 1940-1941. A total of two hundred forty- 
two students, consisting of one hundred fourteen freshmen and one 
hundred twenty-eight sophomores, were studied. 

The investigation sought to answer the following questions: Is 
any technique superior to the others, measured in terms of immediate 
and delayed recall achievement tests, (1) for all students taken collec- 
tively? (2) for low-scoring students on a general intelligence test as 
compared with the high-scoring students? (3) for students of high 
reading ability as compared with those of low reading ability? and 
(4) for students showing superior social studies knowledge as compared 
with those who do not? 

Experimental evidence bearing on these questions should prove 
of immeasurable value in formulating teachers’ study recommenda- 
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tions to students, especially when certain standardized tests are used 
as segregating criteria. 


THE METHODS AND PROCEDURE 


A system of equated student-grouping and a rotation of groups 
was adopted. The groups, four in number, represented a cross-section 
of each class population being studied. Each group contained as 
wide a range as possible of the scores attained by the subjects on a 
standardized general intelligence test, and the means and standard 
deviations of the groups were made as identical as the limitations of 
the sampling permitted. Utilizing the same basic grouping, but 
segregating the scores within each group, made possible a comparison 
of the outcomes of the techniques for the students who scored at high 
and low levels on the intelligence test, on the reading comprehension 
test, and the social studies test that were employed as segregating 
criteria. 

That errors in sampling might be reduced to a minimum, the 
techniques used by each of the four groups were rotated among the 
groups. The final results for each technique-application, therefore, 
became a composite of the scores of all the students for each technique. 
Furthermore, in order that the results should not suffer from the 
unreliability of a single trial, three separate series of four rotations, 
or three results from the use of each technique, were measured. 

The essential steps of the procedure, in actual sequence, were: 

(1) A general intelligence test! was administered to all participat- 
ing students. 

(2) A standardized test of reading comprehension? was adminis- 
tered. The scores on this test were later employed to establish 
whether or not any definite relationships existed between reading 
comprehension ability, so measured, and comparative effectiveness 
of the several techniques. 

(3) A standardized social studies test? was administered to all 
participating students in order to ascertain whether or not proficiency 
in this field, as measured by this test, bore any relationship to the 





1 American Council on Education Psychological Examination for College Fresh- 


men, 1939 and 1940 editions. 
2 Codperative English Test C2, Reading Comprehension. Form Q, Coéperative 


Test Service, 1940 edition. 
3 Test of General Proficiency in the Field of the Social Studies. Revised Series, 


Codperative Test Service, 1940 edition. 
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relative effectiveness of these techniques when applied to historical 
materials. 

(4) Each of the students filled out a questionnaire which was 
designed to determine whether the student had previously studied 
materials which were subsequently used in the investigation. 

(5) Three weeks were devoted to giving instruction to all par- 
ticipating students in the nature and use of the four techniques. This 
instruction included drill work, homework assignments, and proficiency 
tests. 

(6) Twelve weekly class study-sessions were held with each group 
during which the students studied twelve separate bodies of materials. 
These twelve materials were arranged in three series of four units each. 
Each of the four equated groups within the freshman and sophomore 
classes applied a particular one of the four selected techniques to each 
material unit. Thus, for the first such session, Group ‘‘A”’ applied 
outlining; Group “B,”’ précis-writing; Group ‘‘C,” underscoring; and 
Group ‘‘D,”’ repetitive reading to the identical material. At the next 
session, each group applied a different technique to the second unit 
of material. By varying the techniques among the four groups to 
eliminate grouping-errors, each group employed every one of the four 
techniques in studying one of the first four materials. A similar 
procedure was followed for the second and third series. From thirty 
to forty minutes were allowed for studying each unit of material. 

The materials chosen for technique-application were taken from 
the field of history. Because Latin American history offered freedom 
from pre-knowledge, as the student questionnaire had confirmed, the 
materials were selected from that area of history. Only excerpts 
from standard college textbooks were employed, and different portions 
from three were selected to obtain a cross-section of different types 
of historical material. All materials were presented in mimeographed 
form, and were of a length which previously conducted time-trials had 
proved satisfactory for the study time-limit of each session. 

(7) The study of each material was followed by the administration 
of a mimeographed, objective immediate-recall test based on that 
material. The same test was administered about five weeks later 
to test delayed recall. . 

These tests were devised to measure both (1) factual retention 
and (2) grasp of thought processes, such as chronological, cause and 
effect, and main and subordinate relationships. Single word com- 
pletion form of recall, single word answer recall, single choice recogni- 
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tion, and parallel column matching questions were employed. The 
questions were selected so as to be truly representative of the materia] 
being studied, and were constructed in accordance with the generally 
accepted principles of objective-test preparation. Preliminary trials 
among upper classmen gave a basis for selecting and rejecting ques- 
tions. Although every effort was made to design tests which would 
produce valid, reliable measures of the predetermined objectives, 
facilities did not permit large-scale pre-evaluations of test items. 
Evidence of reliability among the tests is to be found in the general 
consistency of final results obtained from the use of the twelve inde- 
pendent tests when applied to varying types of material content. 

The scores attained on these tests were employed to compare the 
relative effectiveness of the techniques. ‘These tests were designed 
to measure both factual retention and thought processes. Each test 
was administered within a ten-minute time period. 

Several precautions were taken to insure the validity of the tech- 
nique comparisons. The technique instruction period served to 
equalize student skill in the use of the four methods. Rigid, identical 
time-limits were maintained for all studying and testing in order to 
eliminate the factor of time variations. Invalidation of results because 
of material variation was eliminated by employing the rotation 
technique, and by conducting three distinct experiments for all par- 
ticipants. To obtain maximum effort in student performance, 
course grades, college credits, and personal technique evaluations 
were given to all participants who coéperated fully. In addition, the 
students were made conscious of the research value of full codperation. 
All directions for studying and testing were given to the students in 
written form to eliminate any error resulting from misunderstanding 


of the procedures. 


SIGNIFICANT FEATURES OF THE STUDY 


Equation of the groups was tested by the usual process of obtaining 
critical ratios from mean differences and their standard errors, and 
also by a “small samples” technique in which the significance of 
mean differences was measured by the following ‘‘t’”’ score formula: 
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Both methods showed no significant mean differences existing for 
any of the paired groups. Critical ratios obtained by the first method; 
namely, dividing the mean differences by the standard errors of mean 
differences, did not exceed .279, or sixty-two chances in a hundred of a 
difference greater than zero. The ‘“‘t” scores obtained by the “‘small 
samples”’ technique are all such as would normally occur between 
seventy and one hundred per cent of the time. 

Dispersion similarity was tested by the conventional method of 
dividing standard deviation differences by their standard errors, with 
resulting critical ratios varying between .067 and .600, that in all cases 
exceeded fifty-three chances in a hundred of a chance difference. A 
“small samples” method for testing dispersion was also employed 
by use of the formula: 
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All ‘‘F'” values were such as would normally occur at least twenty 
per cent of the time. These results confirmed the equation of the 
groups, an essential for making the comparisons meaningful. 

Technique-results were computed by obtaining the differences of 
the means for each pair of techniques, and establishing the reliability 
of the differences by the use of the “t” score method. Table I shows 
these results for all students taken collectively without segregation, but 
sectionally equated. No reliable difference of one technique mean 
over another occurs consistently during all three trials or series for 
either class on immediate or delayed recall. Yet, the preponderance 
of occasions, particularly for immediate recall, when reading and 
underscoring are favored, suggests a definite trend of superiority for 
these methods compared with the other two. Each technique was 
involved in thirty-six comparisons with the other techniques. In 
thirty of these, reading was superior; underscoring exceeded other 
techniques in twenty-six cases; précis-writing proved superior in only 
thirteen instances, and outlining in only three. Except for three 
situations in which it outranked précis-writing, the outlining technique 
constantly made the poorest showing. 

The previously described statistical procedure for evaluating results 
was applied in all of the other comparisons, involving segregated groups 
of high- and low-scoring students based on the intelligence, reading 
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TaBLE ].—MEAN DIFFERENCES AND “t’’ ScORES OF IMMEDIATE AND DELAYED 
ReEcaLL REsvutts FOR ALL STUDENTS, SECTIONALLY EQUATED 































































































Immediate recall Delayed recall 
Series Technique Freshmen Sophomores Freshmen Sophomores 
M: — M2 > Mi: — M: ty Mi — M: ai M: — M; ie 
score score score score 
R** R** R R 
Outlining and reading.| 2.668 (3.348) 3.432 /|4.708 .620 .669) 1.144 /|1.264 
Reading and under- R R | U R 
a 1.121 /|1.363) 1.221 [1.935 .916 . 862 . 364 . 509 
Précis and underscor- u | U pP* U 
1 ima deehet -aeeses .213 .304) 1.508 [2.142 .658 .651 .110 . 145 
P P P P 
Précis and outlining. . 1.334 {1.991 .703 .887) 2.194 {2.529 .670 .717 
R R P | K 
Précis and reading....| 1.334 |1.937| 2.739 (3.817) 1.574 /1.580 .474 . 520 
Outlining and under- U U U U 
Pi cnensoncuses 1.547 {1.917} 1.508 |2.100; 1.536 /|1.617 .780 |1.048 
R R R R 
Outlining and reading. .759 {1.183} 1.017 1.878 .473 {1.018 .873 |1.909 
Reading and under- U U R R 
eee . 565 . 884 .780 j1.350 . 130 .275 . 220 .446 
Précis and underscor- U U** U | vu 
2 Mth 6040000606600 1.278 |2.278} 4.500 |7.449 . 260 . 530 .407 .794 
P O** P P 
Précis and outlining. . .046 .082} 2.703 |4.660 .083 .175 . 246 514 
R R** R 
Précis and reading. ... .713 {1.152} 3.720 (6.273 .390 . 762 .627 |1.277 
Outlining and under- U U U U 
ee 1.324 |2.259] 1.797 .325 . 343 .778 .653 [1.359 
R | R* R R 
Outlining and reading. .713 {1.123} 1.356 [2.414 .463 . 891 .669 [1.199 
Reading and under- R RK a i U 
a .083 .131 .212 .403 . 389 . 769 . 152 . 259 
Précis and underscor- U u* | P U 
Biiveccesessteetbus . 167 .286} 1.305 /|2.339 .037 .079 .593 /|1.030 
3 P 0 P 0 
Précis and outlining. . .463 . 792 .161 . 272 .889 /|1.843 .076 .140 
R R** P R 
Précis and reading. ... . 250 .400} 1.517 |2.666 -426 | .904 .745 |1.333 
Outlining and under- U U U U 
ee .630 {1.060} 1.144 |2.072 .852 |1.068 .517 .901 

















1 This column also shows which technique was favored in the comparison by giving the first letter 
of the name of the technique. The double asterisk, following the letter, indicates the 0-1 per cent 
level of ‘‘t’’ score significance; the single asterisk indicates the 1-2 per cent level. 
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comprehension, and social studies tests. Students were segregated by 
dividing each section in half after scores had been arranged in rank 
order. 

Upper- and lower-scoring students on the intelligence test showed 
no substantial variation in their use of the techniques. Only one 
significant “‘t’”’ score resulted for both immediate and delayed recall, 
a score of 2.742, by which reading showed superiority over outlining 
among the freshmen in one series for immediate recall. The difference 
was unreliable in the other series. For the lower-scoring students, 
eight significant ‘‘t”’ scores occurred, two of which also favored reading 
over outlining. These results were not consistently reliable, for they 
were contradicted in other series of the experiment. 

Both groups gave results that were substantially alike, therefore, 
in that in neither group does one technique show a consistent, reliable 
superiority over the others. This similarity between the groups is 
further emphasized by the fact that reading and underscoring means 
in both cases rank higher than those for the other two techniques, and 
outlining makes the poorest showing for both groups, both in immedi- 
ate and delayed recall results. Thirty-six separate comparisons were 
made between the mean score of each technique and the mean scores 
of the other techniques. For the high-scoring students, the number 
of times each mean score exceeded the others follows: Reading, thirty; 
underscoring, twenty-five; précis-writing, thirteen; and outlining, four. 
The tabulation for low-scoring students is: Reading, twenty-four; 
underscoring, twenty-six; précis-writing, seventeen; and outlining, 
four. 

The two groups which were created on the basis of social studies 
test scores likewise failed to show reliable differentiation in the effec- 
tiveness of one technique over another. The most highly significant 
“t” scores (0-1 per cent level of significance) among the high-scoring 
students are but two in number and are not substantiated in the other 
two series. Lower-scoring students likewise show no consistent, 
significant superiority for any of the techniques, even though several 
highly significant ‘‘t”’ scores exist for the sophomores in the immediate 
recall. With ‘“t” score percentages on the 0-1 per cent level, both 
reading and underscoring are superior to outlining and précis-writing, 
but the significance of these results is contradicted in other trials. 

Both the upper and lower divisions are similar, therefore, in that no 
reliable differences are constantly found. The close harmony between 
the rankings of technique means further substantiates this similarity. 
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Reading means were larger than others in twenty-five out of thirty- 
six comparisons for the higher-scoring students, and in twenty-nine 
comparisons for the lower-scoring students; underscoring means 
exceeded others twenty-five and twenty-seven times; précis-writing 
means, fifteen and thirteen times; and outlining means five and three 
times, respectively. The tendencies toward better results for reading 
and underscoring than for the other techniques, and toward the poor 
showing of the outlining method, are repeated. 

The groupings predicated on the reading test show results virtually 
identical with those for the previous groupings. Neither the high- 
scoring nor Jow-scoring students show significant superiority in the 
comparative use of the techniques for all trials, despite the appearance 
of several such differences in single trials. 

For the higher-scoring students on the reading test, reading means 
exceed the means of the other three methods twenty-six times; under- 
scoring means exceed the others thirty times; précis-writing means are 
the larger eleven times; and outlining means are greater only five times. 
The same comparison for the lower-scoring student gives: Reading, 
twenty-eight; underscoring, twenty-five; précis-writing, thirteen; and 
outlining, six times. Reading and underscoring tend again to be 
favored; outlining again ranks fourth. 


SUMMARY AND CONCLUSIONS 


(1) No consistent, significant superiority of one technique over 
another was found for the students, either unsegregated or divided into 
high- and low-scoring groups on the basis of intelligence, social studies 
ability, and reading comprehension tests. 

(2) The reading and underscoring techniques, both for students 
taken collectively and grouped according to the standardized test 
criteria, show a tendency toward higher scores than do précis-writing 
and outlining, particularly for immediate recall results. 

(3) The outlining technique shows a tendency toward producing 
the lowest scores for the students as a whole and for students segregated 
into low- and high-scoring groups on the standardized tests. 

On the whole, students do approximately as well by using one of 
these techniques as the others under the conditions of the experiment, 
regardless of their standing on general intelligence, social studies, or 
reading comprehension tests. Reading and underscoring tend to be 
slightly more effective than the other techniques for most students, 
especially on immediate recall tests. Distinctions between freshman 
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and sophomore students in the application of these techniques do not 
exist on the basis of statistically significant differences. 

An investigation in which study periods are made flexible enough to 
allow all students the maximum time required for mastery of the mate- 
rial, displacing the time-constant of the present experiment, might 
profitably be pursued. A further investigation of the value of review- 
study from the outline, précis, or underscorings made, as compared 
with single review readings, would approach the whole problem from a 
viewpoint having very practical value. 

A wider application of the several techniques which would include 
college juniors and seniors and different types of study materials would 
give greater comprehensiveness to the investigation of the whole 
problem. 

Inasmuch as the outstanding aim of any study-technique experi- 
ment lies in the predictive or recommendation value which might 
result, the search for criteria that will determine technique-proficiency 
for varying types of students should be continued. All available 
standards that offer a possibility of indicating why certain students 
profit more by using one study method than another should be explored. 
Perhaps the only effective way to make a beginning in this respect is a 
limited individual case study, rather than a large group investigation. 
If any factors other than unpredictable individual differences account 
for relative technique proficiency, it is quite likely that they could be 
more easily discovered in intensive rather than extensive research. If 
and when these factors should be revealed in a small-scale investigation, 
they should, of course, be subsequently tested for large-group 
distributions. 
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MEASUREMENTS OF CERTAIN NONVERBAL 
ABILITIES OF URBAN AND RURAL CHILDREN 


EUGENE L. SHEPARD 
Stephens College, Columbia, Mo. 


Menta] and physical differences between children from diverse 
culture groups may result, in part, from the type and variety of 
environmental contacts and stimulations which are common to that 
geographical section of the country. The environmental milieu in 
which a child is reared may influence the development of certain 
skills, abilities, and fields of knowledge which are considered most 
significant and valuable for those living in that specific geographical 
or social area. As those abilities are encouraged, certain patterns 
of behavior and mental development may be evidenced. 

Blanket statements have been made concerning the mentality of 
country and city children, but the evidence upon which these asser- 
tions are based is usually dependent upon the results of tests of only 
one type; namely, verbal tests. If a more comprehensive and com- 
plete understanding of the individual’s mental status is to be had, a 
battery of tests which involve a number of different intellectual 
functions should be given. The common practice of classifying chil- 
dren as intellectually inferior or superior on the basis of a Stanford- 
Binet examination, or on the basis of a rating on one of the standard 
group intelligence tests, falls far short of an adequate classification. 
An analysis of the various items included in the current tests of intelli- 
gence will reveal that these tests are heavily weighted with items 
involving a knowledge of words and the ability to use them, and toa 
lesser extent numerical ability. The limitations of the verbal test as a 
measure of “‘general ability” indicates the necessity for a series of 
tests which measure many groups of traits and abilities if a knowledge 
of an individual’s true potentialities is to be secured. 


',*.. The problem of this investigation was to determine whether non- 


verbal differences between comparable groups of urban and rural 
children exist and to identify them. Measurement of these abilities 
provide a basis for the evaluation of the “‘intelligence”’ of these groups 
in terms of traits which do not require the use of language for their 
identification or expression. « 

Since the primary purpose of this study was to compare two 
groups of children on tests of a non-verbal nature, it was necessary 
to select tests which would not involve the use of language. After 


458 











Nonverbal Abilities of Urban and Rural Children 459 


careful examination of the tests available, the following were chosen 
for the study on the basis of reliability, validity, objectivity, avail- 


ability, and ease of administration: ({) Minnesota Assembly Tests, (2) 


Minnesota Spatial Relations Tests, (3) Revised Minnesota Paper 
Form Board Tests, (4) Kwalwasser-Dykema Music Tests, (5) and 
the Otis Quick-Scoring Mental Ability Test, Beta A. Al) of the tests 
were non-verbal with the exception of the Otis Quick-Scoring Mental 
Ability Test-which was used for the following purposes: (1) As a 
means of comparing the verbal or linguistic abilities of the two groups, 
(2) as a measure of the normalcy of the groups used, and (3) as a means 
of comparing the present findings with the results of earlier investi- 
gations of urban-rural intellectual differences. 

The rural children used in this study were drawn from the ele- 
mentary schools in the two small cities of Chanute and Independence, 
Kansas. The communities of Chanute and Independence were the 
focal points for trade and business in the center of a rural area. Chil- 
dren living in these two cities in Kansas, the geographical center of 
rural United States, were chosen as representatives of this particular 
cultural area. Such children were subject to socia) forces and influ- 
ences of a nature best described by the term “‘rural,”’ as opposed to 
children living in a highly urbanized section such as New York City. 
In 1938 Chanute had a total population of 9,787; Independence had a 
population of 12,211. All of the eleven-, twelve-, and a few of the 
thirteen-year-old boys and girls in four public elementary schools in 
these smal] cities were examined.# The total number of rural children 
tested was one hundred and fifty-one. The urban group was chosen 
from a representative public school in New York City. A total of one 
hundred forty-five urban children took the tests.g From this number 
a fina] selection of one hundred four children—fifty-seven boys, and 
forty-seven girlsk—was made. An equal number of rural children 
were selected from the total of one hundred and fifty-one tested. The 
groups were matched, child for child, on the basis of the following 
variables: (1) Occupation level of parent, (2) chronological age, (3) 
sex, and (4) place of birth (native or foreign). For equating the 
groups with respect to parental occupational level, the Barr-scale 
rating was used. In using this scale it was necessary only to compare 
the occupation which was to be rated with the occupations whose 
scale values were known, and to assign the occupation to be rated the 
value possessed by the scaled occupation which it most nearly matches. 
Obviously, identical occupations could not be found for urban and 
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rural individuals; hence, equivalent values for the occupations in 
question were used. It was possible to pair exactly all of the other 
variables with the exception of chronological age, and only a mean 
difference of ten days was evident in .this instance. 

The results of all the tests administered to the two groups are 
presented by the data in Table I. 


TABLE I.—PERFORMANCE OF RURAL AND URBAN CHILDREN ON ALL TESTS 
ADMINISTERED TO THE GROUPS 








| | Ss a) 
E 
No. of | Mean | - Mp | Chances 
Test pairs | R, U | Ma ~ My | (differ SE. | in 100 
| ence) 
Paper Form Board....... 104 28.1 1.32 1.21 | 1.10 86" 
26.8 


Spatial Rel. Time Scores.| 104 | 843.0 
833.0 10.80 26.63 41 65" 




















Spatial Relations Error; 104 23 .7 6.85 1.52 | 4.50 | 1008 
Scores. 30.6 

Mechanical Assembly....| 104 | 62.2| 15.3. | 2.41 | 6.30/ 1008 

46.9 | 
| 

NED. oss ceuv ees 104 175.9 5.1 1.87 | 2.70 99.7% 
170.8 

Otis Mental Ability...... 104 35.2 
41.5 6.3 | 1.67 | 3.68 | 1007 








~~“ According to the data in Table 14n four of six instances the differ- 
ence of the means of the rural and urban groups is in favor of the rural. 
Three of the four ratios favoring the rural are statistically reliable, 
while the remaining one indicates a tendency for rural superiority. 
The urban subjects, as a group, excel on the Otis Mental Ability Test. 
This superiority is statistically significant. On the Paper Form Board 
the evidence, although favoring the rural, is equivocal. 9~*y> 
Attention should be called to the fact that the rural children most 
effectively demonstrated their superiority on the Mechanical Assembly 
Tests. The difference in mean score between the two groups was 
15.3, as shown in the second column of Table I. The critical] ratio 
was 6.3, and this is more than twice as large as it needs to be in order 
to guarantee that the true difference is greater than zero. This differ- 
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ence between the groups may be due, in part, to the unfamiliarity 
of the urban children with the mechanical devices employed in the 
test as was indicated by their blind trial-and-error attempts ia certain 
instances to assemble the objects.* An incidental comparison which 
was made between the mean scores of the rural girls and the urban 
boys showed that the mean for the rural girls was less than three 
points below the mean for the urban boys. In other words, the rural 
girls used in this study were about as proficient in mechanical assembly 
as the urban boys of the same age. This seemed to suggest that even 
in a typical sex differentiated task the type of environment in which 
the child lives may be of importance in producing similarities or 
differences in specialized abilities. 

Concerning the performance of the two groups on the only strictly 
verbal test used in the study, inspection of the data in the table above 
shows that the urban group excelled the rural group by approximately 
six score points.. The standard error of this difference is but 1.67, 
so that it can be assumed that the obtained difference does not differ 
from the true difference by more than +3 times 1.67 or +5.01; hence 
the obtained difference is statistically reliable. The central tendencies 
of the two groups may also be compared with the age norms supplied 
by Otis in order to ascertain whether their performance was equal to, 
above, or below the national norms.'#The mean age of the rural- 
urban groups was eleven years, six months. The norm for this age 
is thirty-five points. The median of the rural children was 35.25, 
while the urban children had a median score of 42.29. Thus, in terms 
of the test norms, the rural children were equal to the standardization 
group. The urban children, on the other hand, exceeded the norm 
by seven points.g Shimberg, in her investigation of the validity of 
norms for urban and rural groups found a similar condition to exist.* 
She contended that the type of items included in the ordinary verbal 
intelligence test tended to favor the urban subject. Hence, in making 
comparisons and judgments concerning the relative mental ability 
of groups of children selected from different localities the validity of 
the norms for those specific groups should be carefully examined. 
»..This study of psychological differences of urban and rural children 
indicates that several ability-differences exist. These are as follows: 


* Casual questioning of a number of subjects, both urban and rural, as to what 
the object was on which they were working also was in agreement with this assump- 
tion. Most of the rural subjects indicated that they knew the name of or the use 
to which the object could be put. This was not so frequently encountered in the 
urban children. 
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(1) There is a tendency for the rural children to be superior in 
mechanical ability to the urban children of the same age, sex, and 
occupational level of parent. This difference is statistically reliable 
in the case of the Mechanical Assembly Tests and the Spatial Relations 
Tests (error scores). 

(2) For the groups used in this study, there is an indication of 
rural superiority in music ability as measured by the Kwalwasser- 
Dykema Tests. The difference of the means, however, is not reliable 
statistically. 

(3) The urban children are superior to the rural in verbal ability 
as measured by the Otis Test of Mental Ability. This obtained 
difference is statistically significant. The superiority of the urban 
child in verbal tests of intelligence tends to confirm the findings of 
previous investigations in this regard.*-* 

(4) In tests involving speed of performance, or in tests involving 
maximum time limits, the urban children tend to be superior to the 
rural children.’ 

(5) From the results of this study there is an indication that the 
common assumption that one regional group is intellectually superior 
or inferior to another is unjustified. ..Rather, the performance of the 
different groups should be evaluated in terms of the degree to which 
they possess specific traits and abilities. This study supports previous 
investigations in finding urban children superior verbally. Mechani- 
cally, however, rural children score higher than similar urban groups. 
It is possible to assign the reasons for such differences to environmental 
demands and practices. 
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READING GUIDED BY QUESTIONS VERSUS CAREFUL 
READING FOLLOWED BY QUESTIONS 


SISTER MARY LAURENTIA GOLDEN, R.S.M. 


College Misericordia, Dallas, Pennsylvania 


The purpose of this study was to determine which of two types of 
reading performance was the more efficient procedure in the acquisition 
of learning. ‘These two types were: (1) A type of reading performance 
in which the subject was permitted to read first the question and then 
proceed to find the answer in the reading; (2) a type of reading per- 
formance in which the subject was permitted to read first the paragraph 
and then proceed to answer the questions. 

Since in the intermediate grades fluent reading habits are best 
cultivated, approximately three hundred fifth- and sixth-grade pupils 
of three parochial schools in the metropolitan area of New York City 
were originally selected for the experiment. Two hundred thirty- 
eight pupils (one hundred nineteen in each group), equated by the mean 
and standard deviation, actually participated in the experiment. 


TESTS 


The New Standard Reading Test, Form X, Grades II to IX, by 
Kelley, Ruch and Terman, and The Reading Achievement Test, Form A, 
Grades III to VI, Paragraph Meaning, by Durrell and Sullivan, were 
used for the investigation. 

The New Standard Reading Test was selected for equating the groups 
because: (1) A minimum of writing is required; (2) the scoring is rela- 
tively easy; (3) the items insure interest; (4) the reliability coefficient of 
the reading age is 0.95; (5) it has adequate norms. 

The Reading Achievement Test, Paragraph Meaning, was used 
because the problem being attacked called for a test of comprehension. 
The reliability coefficient is 0.94. The test consists of twelve para- 
graphs graded in difficulty. Comprehension of each paragraph is 
ascertained by five multiple-choice questions which measure five differ- 
ent aspects of reading ability. The test, too, is constructed so that it 
is not difficult to re-arrange it for the two types of reading performance 
used in the present study. 


METHOD OF PROCEDURE 


The experimental method using the parallel group technique was 
employed. The parallel group technique, as employed here, is a device 
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to equate the pupils into two groups by the use of the mean and stand- 
ard deviation. The equated groups did not constitute an experimental 
and a control group in a strict sense of the terms, for in an experimental 
group the crucial variable is present while in the control group, all 
other things being kept equal, the crucial variable is absent. Since in 
this experiment there is a crucial variable present in each group, the 
one group for the first reading performance—that is, reading guided 
by questions—will be referred to as Group G, and the other group of the 
second reading performance; namely, reading followed by questions, 
will be referred to as Group F. 


TABLE I.—Grovups EQuaTED ACCORDING TO THE MEAN AND STANDARD DEVIATION 


























Group G Group F 
Ea eee 5A | 5B! 6A | 6B | Total | 5A | 5B) 6A | 6B | Total 
Number................ 24 30 j30 35 [119 |24 |30 [20 [35 [119 
Mean standard........../71. 775. 3 80.3/88.5) 79.4 {71 5/75 4 80.788.3) 79.7 
i ee |) 113. 3 12.3) 9.9) 12.9 hi. 4|13.5)12.3} 9.6) 13.5 





The problem being to determine which of two types of reading 
performance was the more efficient procedure in the acquisition of 
learning—that is, reading guided by questions or reading followed 
by questions—it is evident that a different technique must be used 
for testing each reading performance. The Reading Achievement Test 
in its original form was given to Group G who were to do reading guided 
by questions. In addition to the regular directions given by the 
authors of the tests, the pupils who were to take this test under the first 
reading performance, which is really answering questions with the 
paragraph in sight and opportunity for re-reading, were given these 
directions: ‘‘ Read the question before you read the paragraph. If you 
do this, perhaps, you will not need to read the whole paragraph.” In 
doing the sample paragraph at the beginning of the test, this technique 
was particularly emphasized. 

Permission was granted by the authors of The Reading Achievement 
Test to reorganize the test for the second reading performance in which 
the pupils were to read first the paragraph and then turn the page to 
answer the five questions pertaining to each paragraph. The original 
form and the modified form of the test were identical except that on the 
original test the questions were on the same page, while on the revised 
form the questions were on the reverse side of the page. In addition to 
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the regular directions prescribed by the authors for the administration 
of the test, the following additional instructions were given: ‘“‘ Read the 
paragraph carefully; when you have finished turn the page to answer 
the questions. Never turn back to read the paragraph again. Con- 
tinue to do this with the paragraph found on each page. Remember 
your success depends on how well you read the paragraph before you 
begin to answer the questions. Once the page is turned, it is too late to 
turn back.” During the entire test, which was administered in each 
case by the writer, the principal of the school and the classroom teach- 


ers were present. 


RESULTS AND FINDINGS OF THE INVESTIGATIONS 


The following data indicate the results: 


TaBLE I].—Tue MEAN, STANDARD DEVIATION, AND STANDARD ERROR OF GRouPG 














Group G 
Nag adn axdiansass acu aeweies ah: | 54 | 5B | 64 | 6B | Total 
on 
Week canna cuccccucssennsecces 24 |30 | 30 |35 | 119 
fan ts ohio eck ce eeas se bereet 23.9 | 26.1 | 28.8 33.0 | 28.4 
Standard deviation..................... 6.0; 7.4] 7.4] 7.8 8.0 
ck ee kc eece ea keRarnewae 1.2 1.3 1.3 1.3 | .73 

















Group F 

| 

MS. oseuvaawaaancsendews | 5A | 5B | 6A | 6B | Total 
ran L | | 

| 
EY nn, cc aaes chukeaanene el 24 30 | 30 | 35 119 
SESE eee 25.8 | 27.7 | 30.0 | 34.4] 29.9 
Standard deviation.....................| 7.4]10.4| 8.4) 7.2 9.0 
Standard error.............. 15; 1.9) 1.5) 1.2 .82 














TaBLE 1V.—Tue CritTIcaL RATIO FOR INDIVIDUAL GRADES AND TOTAL OF GROUPS 
G ano F 





DMCA eieisetndad ticaheaduksasane aes 5A 5B 6A 6B Total 








RET IEP SE see Pe . 96 .69 . 66 81 1.3 





As a further means of comparison of the two groups, the total num- 
ber of errors made in each group is presented in Table V. 
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TaBLE V.—NUMBER OF QUESTIONS ATTEMPTED, NUMBER OF QUESTIONS ANSWERER) 
CORRECTLY, AND NUMBER OF Errors MapE In Groups G anp F 














Group G Group F 
Class Total Total Total Total Total Total 
number of | number of number of | number of 
. . number : . number 
questions | questions allt natin questions | questions ; 
attempted | correct attempted | correct ae 
5A 868 569 299 923 621 302 
5B 1046 784 262 1064 823 241 
6A 1214 869 345 1187 934 253 
6B 1619 1157 462 1461 1204 257 
Totals........ 4747 3379 1368 4635 3582 1053 























The percentage of accuracy which perhaps is a better expression of 
the results may be seen in Table VI. 


TaBLE VI.—NvuMBER OF QUESTIONS ATTEMPTED, PERCENTAGE CORRECT AND 
PERCENTAGE INCORRECT 




















Group G Group F 
Class Number of| Per- Per- |Numberof! Per- Per- 
questions | centage | centage | questions | centage | centage 
attempted | correct | incorrect] attempted | correct | incorrect 
5A 868 65 34 923 67 32 
5B 1046 74 25 1064 77 22 
6A 1214 71 28 1187 78 21 
6B 1619 71 28 1461 82 17 
ERT 4747 71 28 4635 77 22 




















ANALYSIS AND INTERPRETATION 


OF FINDINGS 


At first glance Tables II and III appear to show a tendency to favor 
the second reading performance; namely, reading followed by questions, 
since the mean of each grade in Group F is slightly higher than the 
mean of each grade in Group G. Tiegs! however, states that: ‘Two 
or more classes doing the same work rarely obtain identical mean 





1 Tiegs, Ernest W.: Measurement in the Improvement of Learning. Boston: 
Houghton Mifflin Company, 1939, p. 365. 
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scores on the same test, no matter how valid the test or similar the 


9 


groups. .. . 
Since we were dealing with small groups, the difference in means has 


little or no importance. The standard error further confirms the fact 
that the results of the two reading performance were relatively equal. 

The critical ratio is a device used to determine whether it is prob- 
able that a difference between two means is due to chance; 7.e., whether 
it is likely that on repetition of the experiment essentially similar 
results will be obtained. In Table IV there is a slight tendency favor- 
ing Group F. However, according to Tiegs,' “If the quotient is 3 
or more we may infer that a significant difference exists; the larger 
the quotient above 3, the more significant this difference. . . . ” 

In the present investigation the critical ratio is below 3 in each 
case, hence we assume that no significant difference exists. However, 
the differences obtained, though small, were consistent throughout 
all four subgroups, 7.e., always in the same direction. This consistency 
would seem to indicate, at least, a slight preference; namely, reading 
followed by questions. 

In Table V there appears to be sufficient evidence that reading 
guided by questions is less favorable because of the greater number of 
errors made. It is quite possible that the pupils found the answers 
in this procedure by a sort of catchword method, that is, instead of 
reading the material assigned, they sought for some word that occurred 
in the question. Adults are all familiar with this process of looking 
for a certain word or information and realize how they follow the lines 
with no great amount of attention or thought until the material sought 
for is located. The smaller number of errors in Group F seemed to 
indicate that this group in reading the paragraph and then turning the 
page to answer the questions were motivated by the fact that they had 
to comprehend well the material read in order to answer the questions 
without referring back to the paragraph. It was noted, also, that 
when a pupil of Group F did poorly in answering the questions on one 
paragraph, the questions of the following paragraph were answered 
better, which was an indication that he had realized the necessity of 
careful reading and was putting it into practice. 

In Table VI the increase in percentage of accuracy in Group F 
from one grade to the next is significant. It indicates that reading 
followed by questions is superior, since it yields greater accuracy for 





1 Ibid., p. 368. 
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the same amount of time spent in reading; and that were this increase 
to continue throughout the following grades, which it apparently 
would, the second type of reading performance is to be preferred. 


CONCLUSIONS 


After performing the experiment and analyzing the data the 
investigator has drawn the following conclusions: 

(1) The means obtained by Group F are slightly higher in each 
grade than those obtained by Group G. These differences are not 
significant enough, however, to show that reading followed by ques- 
tions has a much greater effect on the learning process than reading 
guided by questions. 

(2) The critical ratio points out that the difference in the two 
reading performance, although not satistically significant, was con- 
sistent throughout the entire investigation, each time favoring reading 
followed by questions. 

(3) Reading followed by questions showed the fewer number of 
errors; in reading guided by questions 1,311 errors appeared, while 
in reading followed by questions 1,057 errors appeared. 

(4) The second reading performance appeared superior to reading 
guided by questions for there was an increase in the percentage of 
accuracy in each succeeding grade. The percentage of accuracy 
attained by the second group above the first group and the increasing 
percentage for each grade are the most significant differences dis- 
covered in the two reading performance. 

(5) The investigator is cognizant of the fact that experiments 
have been made in higher grades with results somewhat different 
from the present investigation. Washburne,! for example, concluded 
that poorly placed questions are worse than no questions at all, and 
according to his investigation the poorest place is at the end of a story 
or chapter. However, as in many of the other experiments on this 
topic, the subjects were college students. It is the opinion of the 
writer that in the intermediate grades the most worth-while habits of 
reading and of study in general are cultivated. Here the lines between 
reading and study must be broken down until to read means a complete 
understanding of what has been read. 


1 Washburne, J. N.: ‘‘The Use of Questions in Social Science Material.” 
Journal of Educational Psychology, Vol. xx, 1929, pp. 321-359. 








NOTE ON THE MEASUREMENT OF THE RESULTS 
OF ATTITUDE EDUCATION: AN AREA 
OF NEEDED RESEARCH 


C. H. PETERSON AND MARION L. FAEGRE 
Institute of Child Welfare, University of Minnesota 


It is gradually becoming recognized explicitly that the development 
of attitudes is a legitimate function of the public schools. Implicitly, 
this aspect of education has been inherent in the American philosophy 
of education. Perhaps the fact that it has been taken for granted has 
been one reason that little direct attention has been given to this area of 
learning. The realization that the fundamental issue involved in the 
present world conflict is one of opposed systems of attitudes has proba- 
bly been a factor leading to our increasing concern with attitude 
education. 

Along with this increasing awareness of this function of education, 
however, there has not been any concerted attempt to measure the 
results of the teaching of attitudes. In the subject-matter field, 
achievement tests have reached a high degree of perfection. In the 
field of attitudes, no one knows what kinds of attitudes are being taught, 
nor how they are taught. This is undoubtedly the result of the fact 
that the acquisition of attitudes, though admittedly of fundamental 
importance—even of greater importance than the absorption of subject- 
matter—has proceeded as an incidental aspect of education. 

Attitudes are being acquired, and will continue being acquired, by 
the young whether incidentally or as a result of planned instruction, 
and much of this learning process takes place in our schools. In the 
light of the importance of attitudes, it should behoove us to concern 
ourselves with the kinds of attitudes that are being acquired, and the 
manner in which they are being learned, in order that steps may be 
taken to assure that constructive attitudes are being developed, and 
that effective methods of instruction are being used. 

The Institute of Child Welfare of the University of Minnesota has 
for many years supplied speakers to interested high schools in the State, 
who have given talks to the students in the area of personality, social, 
and emotional development, including sex education. Just before one 
such series, the present writers decided to attempt to obtain some indi- 
cation of the results of the talks on the attitudes of the students. 
Although the idea occurred so soon before the talks that an adequate 
measuring instrument could not be prepared, a questionnaire of thirty- 

469 








470 The Journal of Educational Psychology 


seven items dealing with personality adjustment, attitudes toward 
parents, and attitudes towards the opposite sex, was prepared. It wag 
administered at the beginning and at the end of a series of four talks, 
the interval being five weeks. 

The analysis of the results of this preliminary experiment did not 
reveal any great changes in attitude, and is, therefore, not presented. 
The sex differences were greater and more significant than the changes 
from the first to the second administration. No control group was 
available to determine the change to be expected during the interval 
in the absence of the talks merely as a result of the stimulation of hay- 
ing taken the questionnaire once. These factors—+z.e., the inadequacy 
of the questionnaire, the limited area covered, the lack of acontrol group, 
in addition to the small size of the group—preclude the drawing of 
definite conclusions. It does seem apparent, however, that changes 
probably do take place as a result of a series of extra-curricular talks 
and discussions. One interesting change in the case of the boys was an 
increase in the percentage who felt that it is hard not to worry over 
possible misfortunes. It is difficult to account for this shift, among 
others, though it has been suggested that it might have been related 
tothe war. (The period was from February 12 to March 19, 1942; the 
average age of the boys was seventeen years eleven months, SD 11.8 
months.) 

The repetition of the experiment is necessary, using more adequate 
measuring instruments, as well as a control group. With adequate 
instruments attitude teaching can be evaluated, and revised to develop 
the desired attitudes, and to avoid misunderstanding and misinter- 
pretation leading to undesired changes. 








—_— 





BOOK REVIEWS 


LuTHER C. GILBERT AND Doris W. GILBERT. Training for Speed and 
Accuracy of Visual Perception in Learning to Spell: A Study of Eye 
Movements. Berkeley, California: University of California Press, 
Publications in Education, Vol. vu, 1942, pp. 351-462. 


The purpose of this study is to investigate the effects of teaching 
spelling by a method which places a premium on speed and accuracy 
of visual perception. Fourth-, fifth-, and sixth-grade pupils were 
used. A control group learned by the method in regular use while 
the experimental group learned by a special method designed to 
increase speed and effectiveness of visual apprehension through the 
use of special workbooks. Evaluation was in terms of weekly tests, 
final tests, and eye-movement records taken before and after the eight 
weeks’ training period. In the experimental group, a word was viewed 
for approximately seven seconds, after which the word was written. 
As long as a pupil made an error in writing the word, the procedure was 
repeated. There were no spelling games, phonetic analysis or diction- 
ary work. Study before the eye-movement camera introduced no 
complicating effects. 

The final test results showed that the experimental group made 
slightly fewer incorrect spellings than the control group. Since the 
training was designed primarily to improve perceptual habits, most of 
the emphasis is placed upon analysis of eye movements. It is recog- 
nized, however, that the practical criterion is ability to spell correctly. 
There was no attempt to train the eye movements in studying, merely 
a limitation of study time, a check on spelling and motivation toward 
accuracy. For the experimental group the eye-movement records 
showed less study time per word, fewer fixations and regressions per 
word, and equal pause duration in comparison with the control group. 
Apparently, the experimental method had increased the recognition 
span, decreased attention to detail and increased discrimination with 
respect to individual study needs. 

With respect to eye movements, it seems that the obvious happened. 
Students adapted to the limited time allowed for studying a word in 
learning to spell. This necessarily was accomplished by fewer fixa- 
tions and the elimination of a certain amount of detailed inspection. 
The general pattern of cross-study, however, was maintained. Fur- 
thermore, the records show that the training resulted in discarding 
of much part-by-part study and a more mature attack on difficult 
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spots. This indicated a unifying effort toward mastery of the word 
asawhole. All this indicates a modified perceptual attack in studying 
the words. 

It seems to the reviewer that greater emphasis should be placed 
upon the fact that limiting the time spent in visual examination of 
word, when accompanied by good motivation, results in more effective 
word study in learning to spell. Presumably this is largely due to 
heightened attention and consequently to an elimination of dawdling 
that frequently occurs during unlimited periods of study. The impli- 
cations of this report for educational practice are clear. In terms of 
economy of time and increased efficiency, the method of teaching 
spelling described here can be employed to good advantage. 

Mies A. TINKER. 


University of Minnesota. 


E. L. THornpDIKE. The Teaching of English Suffixes. New York: 
Bureau of Publications, Teachers College, Columbia University, 


1941. 


Because he believes he will have neither the time nor the facilities 
to complete a comprehensive teacher’s reference book on suffixes, 
Thorndike reports in this monograph “some of the facts that are ready 
and certain conclusions which the completed work would almost cer- 
tainly substantiate” (page 3). The body of the book contains the 
following information about each of some ninety of the commonest 
suffixes: 

(1) A code value (“‘commonness score’’) to represent the number of 
words of each degree of frequency of use in which the given suffix 
appears. For example, the suffix ‘‘-ist’’ appears in: One hundred 
ninety-one words outside the 20,000 most used words; thirty-five 
words in the 16,000 to 20,000; twenty-five in the 12,000 to 15,000; and 


80 On; 
(2) The various meanings of the suffix with a frequency count for 


each; 
(3) An index representing the estimated ease with which a typical 
tenth-grade pupil could recognize that the word is composed of a 


suffix plus some other word or part of a word; 
(4) An index representing the estimated ease with which asimilar 


person could infer the meaning of the word from knowing the root and 
the suffix meanings. 
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Items (3) and (4) above are substantiated by no experimental or 
empirical data. Throughout the volume, Thorndike did not hesitate 
to make intelligent estimates when he considered absolute accuracy 
or objectivity to have little bearing upon the argument. Even a single 
meaning for the various roots and suffixes was frequently chosen 
subjectively from many meanings as they appeared in the New Century 
and Webster’s Collegiate Dictionary (page 18). 

To illustrate the inclusiveness of the description of the various 
suffixes, the following information about “‘-ant” (p. 25) is reproduced 


below: 
(1) Approximately one hundred forty-three words end in this 


suffix. 

(2) Forty of these words have a commonness score of .5; twenty- 
three a commonness score of 1; twenty-four a commonness score of 2; 
twenty-four a commonness score of 3; five a commonness score of 4 or 
5; eight a commonness score of 6 or 7; three a commonness score of 8 or 
9; ten a commonness score of 10-19; and six a commonness score of 20 
or over. (The lower the commonness score, the more common the 
word.) 

(3) The estimated average analysis score (ability of tenth-graders 
to recognize ‘‘-ant’’ as a suffix) was 38 on a scale extending from 0 to 
100. 

(4) The estimated average inference score (ability of tenth-graders 
to infer meaning of word from meaning of root and suffix) was 49 ona 
scale extending from 0 to 100. 

(5) These varieties of meanings appear: 


No. or Worps 


(EstrMaTgep 
FROM HA.rF) 
i I ss os ove lb cineee6¥ see seeauees 33 
b. Xating, as in germinant, hesitant...................200005. 13 
c. Person or thing that Xs, as in claimant.................... 37 
d. Person or thing that Xates, as in intoricant................ 4 
NN ee hs aaa nin een aes e ane esas keen 55 


Throughout the publication and especially in Part IV, where 
Thorndike makes his generalizations under two headings—“ Teaching 
English as a Vernacular” and “Teaching English as a Second Lan- 
guage’’—the admonitions given are generally wise and practical. 

STEPHEN M. Corey. 


University of Chicago. 
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Rosert L. SuTHERLAND. Color, Class, and Personality. Washington, 
D. C.: American Council on Education, 1942, pp. 135. 


One-tenth of the American youth between the ages of sixteen and 
twenty-four are colored. What are the special personality problems 
which have confronted and do confront the various Negroes coming 
from various backgrounds in the American scene? In the volume 
called Color, Class, and Personality are summarized in journalistic 
style the results of studies concerning these problems made for the 
American Youth Commission. This is one of a series of volumes on 
the problem. Previous studies include Children of Bondage, a study 
of personality development in the urban South; Negro Youth at the 
Crossways, a study of the Middle States; Growing Up in the Black 
Belt, a study of Negro youth in rural areas; and Color and Human 
Nature, a study of personality in a northern city. In this volume these 
and supplementary studies are summarized or, more accurately, 
vividly portrayed. There is nothing academic or uninteresting about 
this book. The style is fluent, the description clear, the presentation 
vivid, the implications clearly expressed. In this volume the reader 
will not be burdened with statistics or with statistical considerations 
or with any academic concepts. It is a clear-cut portrayal of things 
as they are and changes to be made if we are to solve the problems of 
barriers to personality development due to color and caste. 

“Things as They Are”’ is the title of Part I of the book and in it are 
described American Negroes who have shared the American dream and 
those who have been isolated from it as cart of the larger problem of 
learning how to be black in a white world. 

Generally speaking, the study attempts to be inclusive in the sense 
that it makes separate studies of colored youth from different areas; 
that is to say, it does not assume a biological conception of individual 
differences. Instead, it picks social situations which are known to 
produce personality effects on colored people and studies each area 
separately. The methods used include psychiatric case studies, life 
histories, clinic reports, questionnaires, tests, and ecological studies. 
The facts concerning previous studies are not reported. Only the 
generalizations are discussed, the social meanings are considered, 
the implications for personality growth are revealed. That there are 
facts is merely implied or expressed in a sentence, but the facts are 
not presented. This makes writing easier. And if the purpose of 
studies of this kind is to educate American people to know more about 
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the race problem in the United States by getting a clearer picture of 
the difficulties of colored youth as they grow in the United States, the 
book is extremely successful in giving just such a picture. 

It is well for us to expect that all American people who have had 
American experiences should behave in the American manner. But 
there are many Negroes who have never known a society composed of 
respectable, law-abiding, industrious, self-reliant families whose 
ambition has been rewarded by good houses, electric refrigerators, and 
an improved social status. Incentives near at hand as far as these 
Negro people are concerned have nothing to do with anything that is 
supposed to build up the American attitude in the light of an American 
dream, as history books report it. And there are other Negroes 
also vividly enough described in this volume who have some intima- 
tions of a “‘ better way of living,”’ through the movies, magazines, radio 
programs, and schools. But here conflict is easily enough aroused in, 
the older colored boy; for example, because he discovers the whites 
will be all too glad to help him get an educational background in a 
high school if he is bright enough, but after he gets it he will be no 
better off than others who have had no such opportunities. Generally 
speaking, there is enough in an American attitude of life so that even a 
Southern gentleman will often support a bright colored boy through 
school but he will never consider giving him an opportunity after he 
has finished school. The bright colored boy who has that type of 
experience has more of a struggle with himself than the one who never 
had the dream and, therefore, was never disappointed in not attaining 
it. There are other backgrounds that make for large differences in 
attitudes and the differences they make are much more specific than 
would be implied by the two or three stereotypes that all too many 
Americans have of Negro groups. ‘The particular stereotypes that 
the author considers are that the colored are shiftless and lazy, that 
they have a chip on their shoulders, that they care little about school, 
that they produce no leaders, that they have more than their share on 
relief, that they waste their money on needless luxuries. The fact 
that there are plenty of white people who, on the basis of objective 
evaluation, could be more accurately described by these stereotypes 
than the colored is not considered. How judgment by merit instead 
of by stereotype can help realistic thinking about the colored youth in 
the United States is discussed in some detail in this book. 

All in all, this is an excellent little volume for the people who wish 
to inform themselves about the problem that is confronting the Negro 
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youth in the American scene and the social implications thereof. The 

book is nicely put together, the print is very clear, the pictures selected 

illustrative of attitudes are well done, and the book is not too big for a 

pleasant evening’s reading. H. MEttTzer. 
Psychological Service Center, St. Louis, Missouri. 


JoHN Murray AND OrtTHERS. Studies in Arithmetic. Vol. u. 
Publications of the Scottish Council for Research in Education, 
xvi. London: University of London Press, 1941, pp. 218. 


This monograph consists of six experimental reports on vocabulary 
of arithmetic, methods of teaching subtraction, the zero in elementary 
arithmetic, types of errors in the basic number facts, an analysis of 
errors in fractions, and placement of topics in arithmetic. These 
studies are noteworthy for (1) significance of problems considered, 
(2) completeness of exposition, (3) analysis of related studies, (4) 
statements of implications for educational practice and curriculum 
changes, and (5) self-criticism by the authors. American educators 
will profit by a careful consideration of the data reported in these 
investigations. Mies A. TINKER. 

University of Minnesota. 


Lovisa YouNGsS AYRES AND KENNETH RODUNER. Adolescent Voice 
Ranges and Materials Published for Adolescent Voices. Eugene, 
Oregon: University of Oregon, 1942, pp. 50. 


The data for this study on voice ranges of junior-high-school 
students were compiled during the Winter of 1936-1937 by Kenneth 
Rodner and the study made by Lovisa Youngs Ayres in the Summer of 
1937. It was carried on by the University of Oregon in coéperation 
with the Carnegie Corporation and just recently published. Mean- 
while other studies on the changing voice have appeared, with par- 
ticular emphasis on the mutation period in the male voice. All agree 
on the danger of straining the larynx during this period. 

The voice of the adolescent has given concern to music educators 
for some time. It is at this age that most boys lose their child voices 
and acquire voices which are definitely masculine. The girls likewise 
lose their child voices and develop differentiated feminine voices; but 
this change is not so obvious or difficult to handle as the change in the 
boy voice. 

The two specific problems of this study were: 

(1) To determine the actual range of the unchanged, the changing, 
and the changed voice as found in the junior high school (two in 
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Eugene, Oregon—three hundred eighty boys, three hundred ninety- 
nine girls). 

(2) To examine available materials published for these voices and 
to determine the suitability of such materials for the ranges found. 

The ranges were found to be quite restricted and the song books 
generally used had considerable unsuitable material, with the exception 
of The Troubadour Series. Grouping by grades is very unsatisfactory: 
no one type of arrangement fits such a mixed group; unison songs are 
quite unfitted to this group as a whole. Where school schedules per- 
mit, it would be very helpful to select classes with reference to range 
unity within the voice classification, and more careful selection of 
music for this age group is necessary. ‘Teachers who handle adolescent 
voices should have not only a reliable knowledge of the child voice, 
but also training in harmony and composition to enable them to make 
song arrangements suitable to the groups they direct. 

The reviewer suggests further research in dynamics, pitch skips, 
duration values of notes, etc. in order to prevent what Dr. Leo Kallen, 
well-known New York otolaryngologist, calls ‘‘singing which is an edu- 
cational crime.” Louis CHESLOCK. 

Peabody Conservatory of Music. 


InvING E. BENDER, HENRY A. Imus, Jonn W. M. RoTHNEy, CAMILLA 
KEMPLE AND Mary R. ENGLAND. Motivational and Visual 
Factors. Hanover, N. H.: Dartmouth College Publications, 1942, 


pp. 369. 


The notion that visual efficiency is to some degree a determinant 
of reading and scholastic proficiency has stimulated much research. 
In general, the findings have been negative or equivocal. Neverthe- 
less, most writers on the subject continue to believe that visual status 
does play a part in school adjustment, particularly in certain cases. 
Such a conviction is held by the authors of this report. ‘ Yet our 
search and research in the college population shows clearly that the 
way the individual with relatively minor defects reacts to his world 
is the function of what he is within, rather than the result of his visual 
errors alone. The visual defects find their place in his particular 
pattern of living in accord with the importance that the defects have 
for him—the kind of conflict they present to him.”’ Although certain 
statistical comparisons were made in this study, most of the emphasis 
is upon study of the individual. Intensive case studies were made of 
one hundred twenty-four Dartmouth students. Twenty repre- 
sentative cases are presented in detail. The stated aim was to 
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discover the place of visual factors in the motivation of the particular 
case. 

Most of the data were derived from a complete visual examination, 
scholastic achievement records, college ability scores, reading ability 
scores, scores on Graduate Record Examination, an attitudinal biog- 
raphy and a mass of psychological data gathered during interviews, 
The pattern of these interviews draws heavily upon the views and 
techniques of both Allport and Murray. 

After presenting the statistical results which, as in other studies, 
revealed no relation between visual factors and either college grades 
or test scores, the authors move on to the main part of their report— 
presenting the case reports. It is again emphasized here that the 
cardinal aim is to discover the place of visual factors in the motivation 
of each case. The case report included a general summary of educa- 
tional experience along with certain test scores, the visual data, the 
psycho-portrait and the réle of the visual factor. Most space was 
devoted to the psycho-portrait which was concerned with various 
traits, attitudes and values of the student’s motivational patterning. 
The influence of the visual factor is inferred from the student’s rational- 
izations concerning its effect upon him and the relief gained from his 
visual correction. 

The effort to demonstrate that visual status is an important fac- 
tor in the motivational pattern seems forced and at times unconvinc- 
ing. Apparently some students had difficulty in finding a rational 
connection between their visual condition and their behavior. Thus, 
one student felt that his glasses might have helped to overcome his 
stammering. Another considered that his glasses gave complete 
insurance against careless reading that might result from tired eyes. 
In certain cases, visual status undoubtedly is involved in the motiva- 
tion pattern as a minor factor. The patterns revealed in the psycho- 
portraits, however, are much more important. Each portrait is an 
interesting and revealing document. 

Many will agree that study of the individual, as outlined here, does 
contribute to the understanding of personality. Nevertheless, such 
a technique probably should supplement rather than become a sub- 
stitute for other well-established methods of studying personality and 
adjustment. Mixes A. TINKER. 


University of Minnesota. 
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